Skip to content

Commit 6787e40

Browse files
authored
Merge pull request #22 from Shibin-Ez/master
Epoll doc final format
2 parents dbc9267 + fddb4cb commit 6787e40

2 files changed

Lines changed: 26 additions & 16 deletions

File tree

docs/guides/resources/linux-epoll.md

Lines changed: 20 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@ This call returns a file descriptor representing the epoll instance. The descrip
106106
107107
- **0**: Creates the epoll instance without setting FD_CLOEXEC. In this case, the epoll file descriptor is inherited across fork() + exec(), which can unintentionally leak the instance into child processes.
108108
109-
**In almost all applicationsespecially servers and multi-process programsEPOLL_CLOEXEC should be used to avoid file descriptor leaks.**
109+
**In almost all applications, especially servers and multi-process programs, EPOLL_CLOEXEC should be used to avoid file descriptor leaks.**
110110
111111
### **Registering a File Descriptor with Epoll**: `epoll_ctl()`
112112
@@ -152,13 +152,13 @@ union epoll_data {
152152

153153
typedef union epoll_data epoll_data_t;
154154
```
155+
155156
::: tip NOTE
156157

157158
The `epoll_data_t` union is defined and passed to the kernel by the programmer. So, when the kernel returns an event, this data can be used to identify which file descriptor triggered it—for example, to distinguish between a listening socket and a connection socket. In later stages, we will use the `void *ptr` field instead of the file descriptor to handle this logic.
158159

159160
:::
160161

161-
162162
When `epoll_ctl()` is called:
163163

164164
- The kernel **copies the contents** of this structure and stores the copied data internally as part of the epoll instance.
@@ -307,25 +307,29 @@ While the process is sleeping, monitored file descriptors may change state. When
307307
308308
Once the sleeping process is woken, execution resumes inside `epoll_wait()`. The kernel again acquires the epoll lock and rechecks the ready list. If entries are present, the kernel iterates over them and copies their associated `struct epoll_event` data into the user-space `events` array.
309309
310-
After copying events, the kernel updates the ready list based on the triggering mode. In level-triggered mode, entries may remain in the ready list if the file descriptor is still ready. In edge-triggered mode, entries are removed and will only be reinserted when a new state change occurs.
310+
After copying events, the kernel updates the ready list based on the triggering mode. This will be explained below.
311311
312312
Finally, the kernel releases the epoll lock and returns to user space, with `epoll_wait()` returning the number of events copied into the `events` array.
313313
314314
Note: `epoll_wait()` itself does not detect I/O readiness; it only consumes events that were previously recorded in the ready list by kernel callbacks.
315315
316-
## Level triggered mode
316+
## Level triggered mode (LT)
317+
318+
In level-triggered mode, epoll reports a file/socket descriptor as ready as long as the readiness condition persists. For EPOLLIN, readiness indicates that unread data is present in the kernel’s receive buffer; for EPOLLOUT, it indicates available space in the kernel’s send buffer. The file descriptor is returned by epoll_wait() on every call while these conditions remain true.
319+
320+
That is, if a file/socket is reported as ready to read by epoll_wait(), the user space code will read data from the file/socket into a user buffer. Suppose that the user buffer gets filled up, but there is still unread data in the file's/socket's kernel buffer. In level-trigerred mode, the descriptor continues to be maintained in the ready list and the next call to epoll_wait() will continue to notify the file/socket as ready to read. On the other hand, if the complete data in the kernel buffer is read into user space, the file/socket will be removed from the ready list (and will get back into the ready list only if new data arrives in the file/socket).
317321
318-
In level-triggered mode, epoll reports a file descriptor as ready as long as the readiness condition persists. For EPOLLIN, readiness indicates that unread data is present in the kernel’s receive buffer; for EPOLLOUT, it indicates available space in the kernel’s send buffer. The file descriptor is returned by epoll_wait() on every call while these conditions remain true. Reading from the descriptor removes data from the receive buffer, and the descriptor continues to be reported until the buffer is fully drained. Similarly, writable events continue to be reported until the send buffer becomes full.
322+
If a file/socket is notified as ready for write, it will continue to be maintained in the ready list and notified in each invocation of epoll_wait() as long as its kernel buffer has remaining free space.
319323
320-
As a result, level-triggered mode reflects the current I/O state of the file descriptor rather than changes in that state, which can lead to repeated notifications if the application does not complete the required I/O operations.
324+
As a result, level-triggered mode reflects the current I/O state of the file/socket descriptor rather than changes in that state, which can lead to repeated notifications if the application does not complete the required I/O operations.
321325
322-
## Edge triggered mode
326+
## Edge triggered mode (ET)
323327
324328
In edge-triggered mode, epoll reports events only when the readiness state changes (for example, when new data arrives on a socket that was previously empty). Once the event is delivered, epoll will not notify again until another state change occurs.
325329
326-
Because **ET does not repeat events**, the application must read or write until the operation returns `EAGAIN`; otherwise, data may remain unread with no further notifications.
330+
Because **Edge triggered mode does not repeat events**, the application must read or write into the file/socket until the kernal buffer is empty/full - identified when read()/write()/recv()/send() system call returns `EAGAIN`. In edge triggered mode, a descriptor that is notified once by epoll_wait() to the user space will no longer continue in the ready list. A monitored descriptor will re-enter the ready list only when new data arrives (and ep_poll_callback() places the file descriptor again into the ready list).
327331
328-
ET reduces unnecessary wakeups and is useful for high-performance servers, but requires more careful programming. This mode is enabled by passing the `EPOLLET` flag when registering the file descriptor with `epoll_ctl()`.
332+
Edge triggered mode reduces unnecessary wakeups and is useful for high-performance servers, but requires more careful application side programming. This mode is enabled by passing the `EPOLLET` flag when registering the file descriptor with `epoll_ctl()`.
329333
330334
::: tip NOTE
331335
`EAGAIN` is a common error code returned by non-blocking I/O operations (e.g., `read`, `write`, `recv`, `send`) when the operation cannot be completed immediately without blocking the calling process. In the context of `epoll` with non-blocking sockets, especially in Edge-Triggered mode, receiving `EAGAIN` indicates that there is no more data to read or the write buffer is full, and you should stop attempting the operation until a new event is reported by `epoll_wait()`.
@@ -335,11 +339,11 @@ ET reduces unnecessary wakeups and is useful for high-performance servers, but r
335339
336340
Each `epitem` (monitored FD) transitions through three stages:
337341
338-
| Stage | Description |
339-
| :------------- | :----------------------------------------------- |
340-
| **Registered** | In red-black tree, not ready yet |
341-
| **Ready** | Added to ready list after kernel callback |
342-
| **Delivered** | Returned by `epoll_wait()`, removed or re-queued |
342+
| Stage | Description |
343+
| :------------- | :---------------------------------------- |
344+
| **Registered** | In red-black tree, not ready yet |
345+
| **Ready** | Added to ready list after kernel callback |
346+
| **Delivered** | Returned by `epoll_wait()` to user space |
343347
344348
## Lifecycle of an Epoll Readiness Event
345349
@@ -385,6 +389,8 @@ A spinlock is a low-level kernel synchronization primitive used to protect share
385389
386390
Spinlocks are used in epoll because parts of the epoll subsystem, including callbacks, may execute in interrupt or softirq context where sleeping is not allowed. Epoll uses spinlocks to protect short critical sections, such as updates to the ready list or internal bookkeeping structures. Because spinning consumes CPU time, spinlocks must be held only for very short durations.
387391
392+
Note: The use of spinlock by epoll is completely within the kernel context and there is no application involvement.
393+
388394
## Interrupt Context (Hard Interrupt Context)
389395
390396
Interrupt context is a kernel execution context entered when the CPU receives a hardware interrupt from a device such as a network card, disk controller, or timer. When an interrupt occurs, the CPU immediately suspends the currently running code, switches to kernel mode if necessary, and begins executing the registered interrupt handler.
@@ -393,8 +399,6 @@ Code running in interrupt context is not associated with any user process or ker
393399
394400
In the context of epoll, interrupt handlers do not directly invoke epoll logic. Instead, they typically schedule deferred work that later leads to epoll callbacks being triggered.
395401
396-
---
397-
398402
## Softirq Context
399403
400404
Softirq context is a kernel execution context used to perform deferred work that was triggered by a hardware interrupt but could not be completed safely or efficiently within the interrupt handler itself. Softirqs allow the kernel to defer processing while still executing soon after the interrupt and without sleeping.

docs/roadmap/phase-0/stage-5-a.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -153,6 +153,12 @@ void accept_connection(int listen_sock_fd) {
153153
154154
Try and implement the function `connect_upstream()` to create a connection to the upstream server.
155155
156+
:::tip NOTE
157+
158+
Since the upstream server is a python file server, the concurrent connections are handled reliably by it's own implementation.
159+
160+
:::
161+
156162
```c
157163
int connect_upstream() {
158164

0 commit comments

Comments
 (0)