Commit 70611c2
fix(webapp): GC drained-env orphans in stale sweep + propagate batch on terminal failure
Two Devin review findings on PR #3754, both real and unresolved:
1. Sharded stale sweep's counts hash never cleared for fully-drained
envs — gauge stayed permanently elevated, false-alerting the
recommended `> 0 for 5m` rule.
Root cause: when an env's last buffered entry is popped, the buffer's
atomic Lua removes the env from `mollifier:org-envs:${orgId}` (and
removes the org from `mollifier:orgs` if it has no other envs). The
sweep's inner loop walks `buffer.listEnvsForOrg(orgId)`, so the env
disappears from the iteration entirely — `setEnvStaleCount(envId, 0)`
(which HDELs the field) is never called, and the counts hash retains
the env's last-known stale count forever.
Fix (Devin's Approach 2): cycle-bounded reconciliation. Add a Redis
SET `mollifier:stale_sweep:visited` that the sweep SADDs into for
every env it touches. When the cursor wraps (cycle complete),
`reconcileVisited()` does `HKEYS counts → SMEMBERS visited → HDEL the
difference → DEL visited`. Pipelined; orphans clear within at most
one full cursor cycle of the env going quiet, which matches the
sharding contract's existing one-cycle freshness window.
Test: "evicts fully-drained envs from the counts hash at cycle wrap"
— accepts an entry, sweep flags it stale, pops the entry (env
vanishes from listEnvsForOrg), runs another sweep that triggers
wrap, asserts the env is HDEL'd from both the snapshot and the
underlying counts hash.
2. Drainer handler's terminal SYSTEM_FAILURE write dropped the
snapshot's `batch` field. If the buffered run was part of a batch,
the failure row wasn't associated with the batch and the batch
parent's completion tracking could hang indefinitely waiting on a
child that landed but isn't visible to the batch.
Fix: extract `snapshot.batch` with structural type guards and pass
it through to `createFailedTaskRun`. Same defensive pattern as the
other snapshot fields in this code path (the snapshot is typed
`Record<string, unknown>` because it came from cjson-decoded buffer
payload).
Test: "propagates the batch association into createFailedTaskRun" —
asserts the call site receives `{ id, index }` from the snapshot.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 6254b4c commit 70611c2
5 files changed
Lines changed: 211 additions & 3 deletions
File tree
- apps/webapp
- app/v3/mollifier
- test
Lines changed: 16 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
153 | 153 | | |
154 | 154 | | |
155 | 155 | | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
156 | 171 | | |
157 | 172 | | |
158 | 173 | | |
| |||
175 | 190 | | |
176 | 191 | | |
177 | 192 | | |
| 193 | + | |
178 | 194 | | |
179 | 195 | | |
180 | 196 | | |
| |||
Lines changed: 18 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
148 | 148 | | |
149 | 149 | | |
150 | 150 | | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
151 | 158 | | |
152 | 159 | | |
153 | 160 | | |
154 | 161 | | |
155 | 162 | | |
156 | 163 | | |
157 | 164 | | |
158 | | - | |
| 165 | + | |
| 166 | + | |
159 | 167 | | |
160 | 168 | | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
161 | 178 | | |
162 | 179 | | |
163 | 180 | | |
| |||
Lines changed: 54 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
| 4 | + | |
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| 10 | + | |
10 | 11 | | |
11 | 12 | | |
12 | 13 | | |
13 | 14 | | |
14 | 15 | | |
15 | 16 | | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
16 | 25 | | |
17 | 26 | | |
18 | 27 | | |
| |||
28 | 37 | | |
29 | 38 | | |
30 | 39 | | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
31 | 48 | | |
32 | 49 | | |
33 | 50 | | |
34 | 51 | | |
35 | 52 | | |
36 | 53 | | |
37 | 54 | | |
| 55 | + | |
38 | 56 | | |
39 | 57 | | |
40 | 58 | | |
| |||
114 | 132 | | |
115 | 133 | | |
116 | 134 | | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
117 | 169 | | |
118 | | - | |
| 170 | + | |
119 | 171 | | |
120 | 172 | | |
121 | 173 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
154 | 154 | | |
155 | 155 | | |
156 | 156 | | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
157 | 198 | | |
158 | 199 | | |
159 | 200 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
24 | 25 | | |
25 | 26 | | |
26 | 27 | | |
| |||
38 | 39 | | |
39 | 40 | | |
40 | 41 | | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
41 | 51 | | |
42 | 52 | | |
43 | 53 | | |
44 | 54 | | |
| 55 | + | |
45 | 56 | | |
46 | 57 | | |
47 | 58 | | |
| |||
363 | 374 | | |
364 | 375 | | |
365 | 376 | | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
366 | 448 | | |
367 | 449 | | |
368 | 450 | | |
| |||
0 commit comments