Skip to content

Commit ed85a1c

Browse files
etrclaude
andcommitted
TASK-094: extend threadsafety_stress with per-resource add_hook CAS race
Adds Sub-test D (per_resource_add_hook_first_call_cas_no_data_race) and extends Sub-test A's driver_body switch from 6 ops to 8 ops to cover the per-resource (middle-tier) hook bus that the existing test header was claiming but not actually exercising. Sub-test D structurally proves the contended-null window in http_resource::ensure_table() (src/http_resource.cpp:93-110) is entered: 64 iterations x 8 racing threads, each iteration synchronised on a std::latch so workers enter add_hook concurrently against a freshly-null hook_table_ slot. The HTTPSERVER_COMPILATION-gated hook_table_raw_() accessor witnesses pre-race null (none installed) -> post-race non-null (exactly one CAS-arbitrated table installed). A per-iteration "entered-after-latch" counter pins the structural CAS-contention proof (>=2 workers visible inside the contended-null window). Total CAS races per run: 512 (shape-bounded; ~5ms warm, ~25ms TSan). Sub-test A's driver_body now also fires: - op 6: fresh stack-local cas_witness_resource per call -> contended-null branch of ensure_table() driven from inside an MHD handler thread while route_table_mutex_ is held shared by dispatch (the documented 3-tier lock order in a single moment). - op 7: shared cas_witness_resource captured by reference -> load-acquire short-circuit branch ("if (existing) return existing;") complementing op 6's contended-null branch. Two new counters (cas_resource_hook_ok, cas_existing_hook_ok) are pinned by LT_CHECK_GT alongside the existing TASK-032/052 counters. The Sub-test A header comment is corrected to stop overclaiming per-resource coverage; an explicit Sub-test D header describes the new coverage. grep -n "resource hook_table" returns three comment lines accurately describing the lock-order claim (AC 3). src/http_resource.cpp is bit-identical (AC 4). No library rebuild required (HTTPSERVER_COMPILATION already in test/Makefile.am AM_CPPFLAGS). No instrumentation macro added; the witness is purely test-side. Manual TSan validation on macOS (Apple clang) confirmed zero data-race warnings; the CI Linux libstdc++ TSan lane re-runs this binary via the existing build-type: tsan matrix entry in .github/workflows/verify-build.yml without workflow changes. Local: HTTPSERVER_STRESS_SECONDS=2 make check -j1 -> 98/98 PASS. Sub-test D INFO line: iters=64 threads_per_iter=8 total_post_race_nonnull=64 contended_window_iters=64. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 9b86e8f commit ed85a1c

1 file changed

Lines changed: 261 additions & 18 deletions

File tree

test/integ/threadsafety_stress.cpp

Lines changed: 261 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -18,16 +18,23 @@
1818
USA
1919
*/
2020

21-
// TASK-032 + TASK-052: DR-008 thread-safety contract stress test.
21+
// TASK-032 + TASK-052 + TASK-094: DR-008 thread-safety contract stress test.
2222
//
2323
// Sub-test A — concurrent_register_block_from_handlers_no_data_race
2424
// Drives the PUBLIC mutating surface (register_path, unregister_path,
25-
// block_ip, unblock_ip) AND, as of TASK-052, the lifecycle-hook
26-
// registration surface (webserver::add_hook + hook_handle::remove)
27-
// from inside MHD handler threads against a running webserver. 16
28-
// curl clients × N seconds at default port 0 (kernel-assigned).
29-
// The hook ops exercise the documented `route_table_mutex_ → resource
30-
// hook_table → server-wide hook_table` lock order under TSan.
25+
// block_ip, unblock_ip) AND, as of TASK-052, the **server-wide**
26+
// lifecycle-hook registration surface (webserver::add_hook +
27+
// hook_handle::remove) AND, as of TASK-094, the **per-resource**
28+
// hook bus via http_resource::add_hook on both a fresh stack-local
29+
// resource (contended-null branch in ensure_table()) and a shared
30+
// resource whose hook_table_ is already installed (load-acquire
31+
// short-circuit branch). 16 curl clients × N seconds at default port
32+
// 0 (kernel-assigned).
33+
// The hook ops here exercise the server-wide tier AND, with TASK-094,
34+
// the resource (middle) tier of the documented
35+
// `route_table_mutex_ → resource hook_table → server-wide hook_table`
36+
// lock order under TSan. Standalone per-resource CAS-race coverage
37+
// lives in Sub-test D (TASK-094).
3138
// The TSan-clean rerun is the headline acceptance:
3239
// `make clean && CXXFLAGS=-fsanitize=thread … && make check` re-runs
3340
// this binary under TSan via the CI matrix entry `build-type: tsan`
@@ -70,6 +77,7 @@
7077
#include <cstdlib>
7178
#include <functional>
7279
#include <iostream>
80+
#include <latch>
7381
#include <memory>
7482
#include <mutex>
7583
#include <random>
@@ -112,6 +120,8 @@ struct OpCounters {
112120
std::atomic<int> unblock_ok{0};
113121
std::atomic<int> hook_add_ok{0}; // TASK-052
114122
std::atomic<int> hook_remove_ok{0}; // TASK-052
123+
std::atomic<int> cas_resource_hook_ok{0}; // TASK-094 op 6: contended-null branch
124+
std::atomic<int> cas_existing_hook_ok{0}; // TASK-094 op 7: load-acquire short-circuit branch
115125
std::atomic<int> handler_calls{0};
116126
};
117127

@@ -133,6 +143,19 @@ class noop_resource : public ht::http_resource {
133143
}
134144
};
135145

146+
// TASK-094: dedicated subclass for Sub-test D so a single stack frame
147+
// can re-arm the lazy `hook_table_` CAS for every fresh iteration. The
148+
// subclass adds no new state; render_get is unreachable in Sub-test D
149+
// (no MHD daemon is started against it — the CAS race is exercised
150+
// purely from worker threads calling `add_hook` directly).
151+
class cas_witness_resource : public ht::http_resource {
152+
public:
153+
ht::http_response render_get(
154+
const ht::http_request&) override {
155+
return ht::http_response::string("ok");
156+
}
157+
};
158+
136159
// TASK-052: register an armed hook on a randomly-chosen phase. Each
137160
// hook overload is a distinct std::function<...> signature, so the
138161
// phase selection happens at compile time via this switch (the runtime
@@ -206,7 +229,8 @@ ht::hook_handle install_random_hook(ht::webserver* ws, unsigned phase_idx) {
206229
// register_path's contract — not a data race).
207230
ht::http_response driver_body(const ht::http_request& req,
208231
ht::webserver* ws, OpCounters* counters,
209-
HookBag* hooks) {
232+
HookBag* hooks,
233+
ht::http_resource* shared_cas_resource) {
210234
counters->handler_calls.fetch_add(1, std::memory_order_relaxed);
211235

212236
int op = 0;
@@ -227,14 +251,19 @@ ht::http_response driver_body(const ht::http_request& req,
227251
const std::string ip =
228252
"198.51.100." + std::to_string(i & (kIpRange - 1));
229253

230-
// Six ops total: 0..3 are the TASK-032 route-table / ban-list
231-
// mutators; 4..5 are TASK-052's hook bus churn. `op % 6` keeps
232-
// the distribution roughly uniform.
254+
// Eight ops total: 0..3 are the TASK-032 route-table / ban-list
255+
// mutators; 4..5 are TASK-052's webserver-side hook bus churn;
256+
// 6..7 are TASK-094's per-resource hook bus churn (op 6 hits the
257+
// contended-null branch of ensure_table() on a fresh stack-local
258+
// cas_witness_resource; op 7 hits the load-acquire short-circuit
259+
// branch on the shared_cas_resource whose hook_table_ is already
260+
// installed after the first op-7 call lands). `op % 8` keeps the
261+
// distribution roughly uniform.
233262
// NOTE: register_prefix / unregister_prefix are intentionally not
234263
// exercised here because they share the same lock path as
235264
// register_path / unregister_path (register_impl_ with family=true
236265
// vs false); the existing cases already cover the mutex contention.
237-
switch (op % 6) {
266+
switch (op % 8) {
238267
case 0:
239268
try {
240269
ws->register_path(dyn_path,
@@ -308,6 +337,48 @@ ht::http_response driver_body(const ht::http_request& req,
308337
}
309338
break;
310339
}
340+
case 6: {
341+
// TASK-094: per-resource CAS race — fresh stack-local
342+
// resource whose hook_table_ slot is null. While this
343+
// handler is in flight, register_path / unregister_path
344+
// (cases 0/1) on other threads are holding
345+
// route_table_mutex_ shared, so this case exercises the
346+
// full three-tier order
347+
// route_table_mutex_ (shared, this thread is a reader
348+
// inside dispatch) -> resource hook_table_ (this
349+
// thread's CAS in ensure_table()) -> server-wide
350+
// hook_table (other threads' webserver::add_hook in
351+
// cases 4/5).
352+
// The hook_handle is dropped at end of scope so the
353+
// resource's destructor at end of scope runs against a
354+
// weak_ptr that expires cleanly.
355+
cas_witness_resource transient;
356+
ht::hook_handle h = transient.add_hook(
357+
ht::hook_phase::request_completed,
358+
std::function<void(const ht::request_completed_ctx&)>(
359+
[](const ht::request_completed_ctx&) {}));
360+
(void)h;
361+
counters->cas_resource_hook_ok.fetch_add(
362+
1, std::memory_order_relaxed);
363+
break;
364+
}
365+
case 7: {
366+
// TASK-094: shared resource whose hook_table_ is
367+
// installed after the first op-7 call. Subsequent calls
368+
// take the load-acquire short-circuit branch in
369+
// ensure_table() (`if (existing) return existing;`),
370+
// complementing case 6's contended-null branch. The
371+
// handle is dropped immediately; remove() runs against
372+
// the still-alive resource at end of scope.
373+
ht::hook_handle h = shared_cas_resource->add_hook(
374+
ht::hook_phase::request_completed,
375+
std::function<void(const ht::request_completed_ctx&)>(
376+
[](const ht::request_completed_ctx&) {}));
377+
h.remove();
378+
counters->cas_existing_hook_ok.fetch_add(
379+
1, std::memory_order_relaxed);
380+
break;
381+
}
311382
}
312383

313384
return ht::http_response::string("ok");
@@ -342,6 +413,12 @@ LT_BEGIN_AUTO_TEST(threadsafety_stress_suite,
342413
concurrent_register_block_from_handlers_no_data_race)
343414
OpCounters counters;
344415
HookBag hooks; // TASK-052: bag retained for the duration of the test
416+
// TASK-094: shared per-resource CAS target for op 7. The first
417+
// op-7 call installs the hook_table_ via the contended-null path;
418+
// every subsequent op-7 call across all client threads takes the
419+
// load-acquire short-circuit branch in ensure_table(), driving
420+
// concurrent registration + dispatch on the middle tier.
421+
cas_witness_resource shared_cas_resource;
345422

346423
// Port 0 lets the kernel pick a free port; read it back via
347424
// get_bound_port() to avoid hard-coded-port collisions when this
@@ -353,8 +430,10 @@ LT_BEGIN_AUTO_TEST(threadsafety_stress_suite,
353430
.max_threads(8)};
354431

355432
ws.on_get("/driver",
356-
[&ws, &counters, &hooks](const ht::http_request& req) {
357-
return driver_body(req, &ws, &counters, &hooks);
433+
[&ws, &counters, &hooks, &shared_cas_resource](
434+
const ht::http_request& req) {
435+
return driver_body(req, &ws, &counters, &hooks,
436+
&shared_cas_resource);
358437
});
359438

360439
ws.start(false);
@@ -386,8 +465,10 @@ LT_BEGIN_AUTO_TEST(threadsafety_stress_suite,
386465
std::mt19937 rng(static_cast<uint32_t>(c) * 0x9e3779b9u);
387466

388467
while (!stop.load(std::memory_order_relaxed)) {
389-
// Six ops: 0..3 route-table/ban; 4..5 hook bus (TASK-052).
390-
const int op = static_cast<int>(rng() % 6u);
468+
// Eight ops: 0..3 route-table/ban; 4..5 webserver-side
469+
// hook bus (TASK-052); 6..7 per-resource hook bus
470+
// (TASK-094 — contended-null + load-acquire branches).
471+
const int op = static_cast<int>(rng() % 8u);
391472
const int i = static_cast<int>(rng() & (kIpRange - 1));
392473
const std::string url =
393474
base + "?op=" + std::to_string(op) +
@@ -415,17 +496,20 @@ LT_BEGIN_AUTO_TEST(threadsafety_stress_suite,
415496
for (auto& t : clients) t.join();
416497

417498
// Acceptance criterion: 60 s of concurrent register/lookup/block.
418-
// We don't pin exact counts — the gate is "all six mutating ops
499+
// We don't pin exact counts — the gate is "all eight mutating ops
419500
// executed at least once without deadlock or crash, and (under the
420501
// documented TSan rebuild) no data race fired". TASK-052 adds the
421-
// hook add/remove pair to the same gate.
502+
// webserver-side hook add/remove pair; TASK-094 adds the per-
503+
// resource CAS-driven pair to the same gate.
422504
LT_CHECK_GT(counters.handler_calls.load(), 0);
423505
LT_CHECK_GT(counters.register_ok.load(), 0); // TASK-032
424506
LT_CHECK_GT(counters.unregister_ok.load(), 0); // TASK-032
425507
LT_CHECK_GT(counters.block_ok.load(), 0); // TASK-032
426508
LT_CHECK_GT(counters.unblock_ok.load(), 0); // TASK-032
427509
LT_CHECK_GT(counters.hook_add_ok.load(), 0); // TASK-052
428510
LT_CHECK_GT(counters.hook_remove_ok.load(), 0); // TASK-052
511+
LT_CHECK_GT(counters.cas_resource_hook_ok.load(), 0); // TASK-094
512+
LT_CHECK_GT(counters.cas_existing_hook_ok.load(), 0); // TASK-094
429513

430514
// Drain the hook bag explicitly before the webserver stops so the
431515
// hook_handle destructors run while the impl is still alive. (The
@@ -770,6 +854,165 @@ LT_BEGIN_AUTO_TEST(threadsafety_stress_suite,
770854
ws.stop();
771855
LT_END_AUTO_TEST(adversarial_segments_registration_no_latency_spike)
772856

857+
// ---------------------------------------------------------------------
858+
// Sub-test D — per_resource_add_hook_first_call_cas_no_data_race
859+
// (TASK-094).
860+
//
861+
// Targets the lazy CAS path in `http_resource::ensure_table()`
862+
// (`src/http_resource.cpp:93-110`) that the M5 hook bus added.
863+
// Constructs `kRepeats` fresh `cas_witness_resource` subclasses; for
864+
// each, releases `kThreads` (>= 8) racing add_hook callers on a
865+
// std::latch. The contended-null window in `ensure_table()` is
866+
// observably entered (witnessed via the HTTPSERVER_COMPILATION-gated
867+
// `hook_table_raw_()` accessor declared in `http_resource.hpp` — pre-
868+
// race null reads on the main thread pin the structural property that
869+
// every iteration starts with `hook_table_` == nullptr). After the
870+
// race, the worker threads exercise mixed add/remove churn on the
871+
// now-installed `resource_hook_table` to drive concurrent registration
872+
// + dispatch on the middle tier.
873+
//
874+
// This phase **completes** the
875+
// `route_table_mutex_ -> resource hook_table -> server-wide hook_table`
876+
// lock-order claim that Sub-test A makes only for the bookend tiers
877+
// (server-wide via webserver::add_hook, and route_table_mutex_ via
878+
// register_path / unregister_path).
879+
//
880+
// Wall-clock: shape-bounded (~1-5 s under TSan slowdown); no
881+
// HTTPSERVER_STRESS_SECONDS integration. Total CAS races per run =
882+
// kRepeats * kThreads = 64 * 8 = 512.
883+
// ---------------------------------------------------------------------
884+
885+
LT_BEGIN_AUTO_TEST(threadsafety_stress_suite,
886+
per_resource_add_hook_first_call_cas_no_data_race)
887+
constexpr int kRepeats = 64;
888+
constexpr int kThreads = 8;
889+
890+
// Cumulative witnesses across all iterations. The accumulator is
891+
// the falsifiable gate for the cycle: it must reach kRepeats (each
892+
// iteration must drive at least one successful add_hook installing
893+
// the table). On the cycle-1 RED step the worker threads do not
894+
// call add_hook, so total_post_race_nonnull stays at 0 and the
895+
// LT_CHECK_GT fails as designed.
896+
std::atomic<int> total_post_race_nonnull{0};
897+
// Cumulative count of iterations that observed >= 2 workers
898+
// entering add_hook simultaneously after the latch released —
899+
// structural proof that the contended-null window was reached
900+
// (per design consideration 4 in the plan).
901+
std::atomic<int> contended_window_iters{0};
902+
903+
for (int iter = 0; iter < kRepeats; ++iter) {
904+
cas_witness_resource r;
905+
906+
// Pre-race witness: the resource starts with hook_table_ ==
907+
// nullptr. hook_table_raw_() is HTTPSERVER_COMPILATION-gated;
908+
// the test TU defines that macro via test/Makefile.am
909+
// AM_CPPFLAGS, so the accessor is reachable here.
910+
LT_CHECK(r.hook_table_raw_() == nullptr);
911+
912+
// +1 for the main thread; the latch is released only after
913+
// every worker AND the main thread have arrived. This biases
914+
// all workers to enter add_hook on (approximately) the same
915+
// instruction cycle, maximising the chance the racing CAS
916+
// hits the contended-null window.
917+
std::latch start(kThreads + 1);
918+
919+
// entered_after_latch: per-iteration counter incremented by
920+
// every worker the instant it returns from arrive_and_wait().
921+
// Inspected immediately after main joins all workers to count
922+
// iterations that fielded >= 2 concurrent add_hook entries.
923+
std::atomic<int> entered_after_latch{0};
924+
925+
// Per-iteration handle bag: every worker pushes its returned
926+
// hook_handle here under handles_mtx so the registrations
927+
// outlive the iteration body just long enough for the post-
928+
// race witness read. The bag is destroyed at the end of the
929+
// iteration BEFORE `r` exits scope so each handle's dtor
930+
// runs against a still-live resource (mirrors the lifetime
931+
// pattern in hooks_per_route_resource_destroyed_first).
932+
std::mutex handles_mtx;
933+
std::vector<ht::hook_handle> handles;
934+
handles.reserve(static_cast<std::size_t>(kThreads));
935+
936+
std::vector<std::thread> workers;
937+
workers.reserve(kThreads);
938+
for (int t = 0; t < kThreads; ++t) {
939+
workers.emplace_back([&] {
940+
start.arrive_and_wait();
941+
entered_after_latch.fetch_add(
942+
1, std::memory_order_relaxed);
943+
// Race the lazy CAS in ensure_table(): every worker
944+
// calls the same add_hook overload on the freshly-
945+
// null `r.hook_table_` slot. Exactly one thread's
946+
// make_shared+compare_exchange wins; the rest take
947+
// either the load-acquire short-circuit (winner has
948+
// already published) or the genuine CAS-loser
949+
// branch (`expected` updated by the failed exchange).
950+
ht::hook_handle h = r.add_hook(
951+
ht::hook_phase::request_completed,
952+
std::function<void(const ht::request_completed_ctx&)>(
953+
[](const ht::request_completed_ctx&) {}));
954+
std::lock_guard<std::mutex> lk(handles_mtx);
955+
handles.push_back(std::move(h));
956+
});
957+
}
958+
start.arrive_and_wait();
959+
for (auto& th : workers) th.join();
960+
961+
// Post-race mixed add/remove burst (action item 1.c). Each
962+
// burst hits the now-installed resource_hook_table from
963+
// multiple threads, exercising concurrent registration +
964+
// dispatch contention on the middle tier of the lock order.
965+
constexpr int kBurstThreads = 4;
966+
constexpr int kBurstOpsPerThread = 8;
967+
std::vector<std::thread> burst;
968+
burst.reserve(kBurstThreads);
969+
for (int b = 0; b < kBurstThreads; ++b) {
970+
burst.emplace_back([&] {
971+
for (int op = 0; op < kBurstOpsPerThread; ++op) {
972+
ht::hook_handle h = r.add_hook(
973+
ht::hook_phase::request_completed,
974+
std::function<void(
975+
const ht::request_completed_ctx&)>(
976+
[](const ht::request_completed_ctx&) {}));
977+
h.remove();
978+
}
979+
});
980+
}
981+
for (auto& th : burst) th.join();
982+
983+
// Drain the post-race handle bag before `r` exits scope so
984+
// every hook_handle's dtor runs against a still-live
985+
// resource. This is the same lifetime ordering enforced by
986+
// hooks_per_route_resource_destroyed_first.
987+
handles.clear();
988+
989+
if (entered_after_latch.load(std::memory_order_relaxed) >= 2) {
990+
contended_window_iters.fetch_add(
991+
1, std::memory_order_relaxed);
992+
}
993+
if (r.hook_table_raw_() != nullptr) {
994+
total_post_race_nonnull.fetch_add(
995+
1, std::memory_order_relaxed);
996+
}
997+
}
998+
999+
std::cout << "[INFO] per_resource CAS: iters=" << kRepeats
1000+
<< " threads_per_iter=" << kThreads
1001+
<< " total_post_race_nonnull="
1002+
<< total_post_race_nonnull.load()
1003+
<< " contended_window_iters="
1004+
<< contended_window_iters.load() << "\n";
1005+
1006+
// Falsifiable gate: every iteration must install a table (some
1007+
// thread must have driven ensure_table() to completion). On the
1008+
// cycle-1 RED step this is 0 — proves the witness mechanism is
1009+
// wired and observably distinguishes install vs no-install.
1010+
LT_CHECK_GT(total_post_race_nonnull.load(), 0);
1011+
// Structural CAS-contention gate (cycle 2): at least one iteration
1012+
// must have observed >= 2 workers entering add_hook concurrently.
1013+
LT_CHECK_GT(contended_window_iters.load(), 0);
1014+
LT_END_AUTO_TEST(per_resource_add_hook_first_call_cas_no_data_race)
1015+
7731016
LT_BEGIN_AUTO_TEST_ENV()
7741017
AUTORUN_TESTS()
7751018
LT_END_AUTO_TEST_ENV()

0 commit comments

Comments
 (0)