Skip to content

Commit 8a6a5e3

Browse files
committed
Fix NetAcceptAction::cancel() use-after-free race condition
Fix a race condition between NetAcceptAction::cancel() and NetAccept::acceptEvent() where the server pointer could be dereferenced after the NetAccept object was deleted. The race occurred as follows: 1. Thread T4 calls NetAcceptAction::cancel(), sets cancelled=true 2. Thread T3 running acceptEvent() sees cancelled==true 3. Thread T3 deletes the NetAccept (including embedded Server) 4. Thread T4 tries to call server->close() on freed memory The fix uses std::atomic<Server*> with atomic exchange to ensure only one thread can successfully obtain and use the server pointer. Both cancel() and the cleanup paths before delete this atomically exchange the pointer with nullptr - whichever succeeds first closes the server, the other becomes a no-op. This addresses the TODO comment that was in the code: "// TODO fix race between cancel accept and call back" ASAN report this fixes (seen intermittently on rocky CI builds): ==8850==ERROR: AddressSanitizer: heap-use-after-free on address 0x616000028cb4 at pc 0x000001346739 bp 0x7fa40fd2f580 sp 0x7fa40fd2f570 WRITE of size 4 at 0x616000028cb4 thread T4 ([ET_NET 2]) #0 0x1346738 in UnixSocket::close() ../src/iocore/eventsystem/UnixSocket.cc:138 #1 0x12b44ed in Server::close() ../src/iocore/net/Server.cc:88 #2 0x121fb95 in NetAcceptAction::cancel(Continuation*) ../src/iocore/net/P_NetAccept.h:71 #3 0x7fa41686d082 in TSActionCancel(tsapi_action*) ../src/api/InkAPI.cc:5828 ... 0x616000028cb4 is located 308 bytes inside of 576-byte region [0x616000028b80,0x616000028dc0) freed by thread T3 ([ET_NET 1]) here: #0 0x7fa416d2036f in operator delete(void*, unsigned long) #1 0x12593c4 in NetAccept::~NetAccept() ../src/iocore/net/P_NetAccept.h:128 #2 0x12bebf0 in NetAccept::acceptEvent(int, void*) ../src/iocore/net/UnixNetAccept.cc:484 ...
1 parent 05f06fa commit 8a6a5e3

10 files changed

Lines changed: 136 additions & 16 deletions

File tree

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Evaluate and Apply Review Comments
2+
3+
Using gh, view the review comments on the PR. Evalaute what is valuable to
4+
apply changes from and how to reply to comments that are not valuable to make
5+
changes from.
6+
7+
- Do not push replies to the PR. Exlain to me what is valuable to reply and I
8+
will evaluate your feedback.
9+
- Do any changes from the review coments.
10+
- Build: @build-ats.md.
11+
- Run any autests: @run-autests.md
12+
- Do not commit the changes, leave them unstaged for me to review.
13+

.cursor/commands/build-ats.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# Build Apache Traffic Server
2+
3+
ATS is a cmake project that I build using ninja.
4+
5+
- I have helper scripts in ~/bin/ to run the initial cmake command and build it.
6+
- Subsequent builds, after the `build` directory is created can be built simply with cmake --build
7+
8+
## First build
9+
10+
Use this generally for the initial build:
11+
~/bin/build_ats
12+
13+
Use this if unit tests are required:
14+
~/bin/build_ats_ut
15+
16+
## Subsequent builds
17+
18+
If the `build` directory exists already, then don't waste time with a full
19+
cmake -B command via those `build_ats` scripts. Simply do the following:
20+
21+
```
22+
cmake --build build
23+
cmake --install build
24+
```
25+
26+
## Formatting
27+
28+
Before doing any commit, make sure the code is formatted. A pre-commit hook
29+
will prevent the commit if it is not formatted.
30+
31+
```
32+
cmake --build build --target format
33+
```
34+
35+
## Docs Build
36+
37+
Doc builds are easier:
38+
39+
```
40+
cmake -B docs-build --preset ci-docs
41+
cmake --build docs-build --target generate_docs -v
42+
```

.cursor/commands/commit.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# Generate a git commit
2+
3+
When creating a git commit for a patch:
4+
5+
[ ] Run `cmake --build build --target format` if not run since last change. A pre-commit hook will prevent commits unless the code is formatted.
6+
[ ] Create a short one line summary. Probably less than 60 characters.
7+
[ ] Write a concise couple paragraphs: describe the problem being addressed and the solution.
8+
[ ] Describe the solution at a high level. If people want code details, they can look at the patch.
9+
[ ] If the commit addresses an issue, end with: `Fixes: #<issue_number>`
10+
11+
Do not push the commit unless explicitly asked to.

.cursor/commands/plan-issue.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# Write a plan to fix an issue.
2+
3+
Using gh, view the content of the issue. Make a plan for how to address it.
4+
5+
[ ] Read the description and the relevant code. Understand the issue.
6+
[ ] Plan for how to write tests for the issue, if applicable. Consider catch2 tests as well as autests (see @writing-autests.mdc).
7+
[ ] Write the plan for the production fix.
8+
[ ] Plan to build it using /build-ats.
9+
[ ] Plan to run any added autests with /run-autests
10+
11+
Unless asked otherwise, do not post to the issue. I will do this myself.

.cursor/commands/review-pr.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# Review a PR
2+
3+
Using gh, view the contents of the PR. Some good things to check for:
4+
5+
[ ] Expensive pass by copy rather than reference or pointer.
6+
[ ] Opportunities for using modern C++ features.
7+
[ ] Opportunities for using the lib/libswoc/ API, such as TextView.
8+
[ ] Resource leaks
9+
[ ] Duplicated code
10+
11+
And any other generic software issues you might see or issues with the
12+
domain-specific patch.

.cursor/commands/run-autests.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# Run the autests
2+
3+
The autests are automated end to end tests, see @writing-autests.mdc.
4+
5+
They are not run via directly calling the python interpreter, despite being .py
6+
files. That won't work. Rather they must be run via the autest.sh script in the
7+
build directory.
8+
9+
After building and installing ATS (see @build-ats.md):
10+
11+
```
12+
cd build/tests
13+
./autest.sh --sandbox /tmp/sb --clean=none -f test_name...
14+
```
15+
16+
The `test_name` passed there is the name of the autest without the .test.py
17+
exension. Thus, to run the `show_ssl_multicert.test.py` test, you would:
18+
19+
20+
```
21+
cd build/tests
22+
./autest.sh --sandbox /tmp/sb --clean=none -f show_ssl_multicert
23+
```

src/iocore/net/P_NetAccept.h

Lines changed: 18 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@
4343
#include "iocore/net/NetAcceptEventIO.h"
4444
#include "Server.h"
4545

46+
#include <atomic>
4647
#include <vector>
4748

4849
struct NetAccept;
@@ -60,21 +61,29 @@ AcceptFunction net_accept;
6061

6162
class UnixNetVConnection;
6263

63-
// TODO fix race between cancel accept and call back
6464
struct NetAcceptAction : public Action, public RefCountObjInHeap {
65-
Server *server;
65+
std::atomic<Server *> server{nullptr};
6666

67-
void
68-
cancel(Continuation *cont = nullptr) override
67+
NetAcceptAction(Continuation *cont, Server *s)
6968
{
70-
Action::cancel(cont);
71-
server->close();
69+
continuation = cont;
70+
if (cont != nullptr) {
71+
mutex = cont->mutex;
72+
}
73+
server.store(s, std::memory_order_release);
7274
}
7375

74-
Continuation *
75-
operator=(Continuation *acont)
76+
void
77+
cancel(Continuation *cont = nullptr) override
7678
{
77-
return Action::operator=(acont);
79+
// Close the server before setting the cancelled flag. This ensures that
80+
// when acceptEvent() sees cancelled == true, the server close is already
81+
// complete, preventing use-after-free races.
82+
Server *s = server.exchange(nullptr, std::memory_order_acq_rel);
83+
if (s != nullptr) {
84+
s->close();
85+
}
86+
Action::cancel(cont);
7887
}
7988

8089
~NetAcceptAction() override

src/iocore/net/QUICNetProcessor.cc

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -251,9 +251,7 @@ QUICNetProcessor::main_accept(Continuation *cont, SOCKET fd, AcceptOptions const
251251
na->server.sock = UnixSocket{fd};
252252
ats_ip_copy(&na->server.accept_addr, &accept_ip);
253253

254-
na->action_ = new NetAcceptAction();
255-
*na->action_ = cont;
256-
na->action_->server = &na->server;
254+
na->action_ = new NetAcceptAction(cont, &na->server);
257255
na->init_accept();
258256

259257
return na->action_.get();

src/iocore/net/UnixNetAccept.cc

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -479,6 +479,7 @@ NetAccept::acceptEvent(int event, void *ep)
479479
MUTEX_TRY_LOCK(lock, m, e->ethread);
480480
if (lock.is_locked()) {
481481
if (action_->cancelled) {
482+
// Server was already closed by whoever called cancel().
482483
e->cancel();
483484
Metrics::Gauge::decrement(net_rsb.accepts_currently_open);
484485
delete this;
@@ -487,6 +488,7 @@ NetAccept::acceptEvent(int event, void *ep)
487488

488489
int res;
489490
if ((res = net_accept(this, e, false)) < 0) {
491+
action_->cancel();
490492
Metrics::Gauge::decrement(net_rsb.accepts_currently_open);
491493
/* INKqa11179 */
492494
Warning("Accept on port %d failed with error no %d", ats_ip_port_host_order(&server.addr), res);
@@ -637,7 +639,7 @@ NetAccept::acceptFastEvent(int event, void *ep)
637639
return EVENT_CONT;
638640

639641
Lerror:
640-
server.close();
642+
action_->cancel();
641643
e->cancel();
642644
Metrics::Gauge::decrement(net_rsb.accepts_currently_open);
643645
delete this;
@@ -656,6 +658,7 @@ NetAccept::acceptLoopEvent(int event, Event *e)
656658
}
657659

658660
// Don't think this ever happens ...
661+
action_->cancel();
659662
Metrics::Gauge::decrement(net_rsb.accepts_currently_open);
660663
delete this;
661664
return EVENT_DONE;

src/iocore/net/UnixNetProcessor.cc

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -133,9 +133,7 @@ UnixNetProcessor::accept_internal(Continuation *cont, int fd, AcceptOptions cons
133133
na->proxyPort = sa ? sa->proxyPort : nullptr;
134134
na->snpa = dynamic_cast<SSLNextProtocolAccept *>(cont);
135135

136-
na->action_ = new NetAcceptAction();
137-
*na->action_ = cont;
138-
na->action_->server = &na->server;
136+
na->action_ = new NetAcceptAction(cont, &na->server);
139137

140138
if (opt.frequent_accept) { // true
141139
if (accept_threads > 0 && listen_per_thread == 0) {

0 commit comments

Comments
 (0)