fix(acp,#4760): dedup concurrent createSessionAsync by sessionId#4764
fix(acp,#4760): dedup concurrent createSessionAsync by sessionId#4764aegis-gh-agent[bot] wants to merge 680 commits into
Conversation
Covers system, user, assistant, thinking, tool_use, tool_result, tool_error messages plus focus, keyboard shortcuts, timestamps, element ID generation, and clipboard interactions. Co-authored-by: Hephaestus <hep@aegis.dev>
- jsonl-watcher-worker.ts: worker_threads Worker that accepts watch/unwatch/setOffset commands, uses fs.watch with debouncing, reads JSONL entries via readNewEntries(), posts parsed results back to main thread via parentPort - jsonl-watcher-bridge.ts: main-thread bridge implementing JsonlWatcher-compatible public API (onEntries, watch, unwatch, stop, destroy, setOffset, isWatching, getOffset) with EventBus integration for session.lineParsed events - 10 tests: lifecycle, offset tracking, truncation detection, large fixture streaming, resume from offset, memory bounds, cleanup Co-authored-by: Hephaestus <hep@aegis.dev>
…#4361) * fix(api): remove session counts from unauthenticated /health response Issue #4355: The /health endpoint was returning session counts (active, total) without requiring authentication. This is minor info disclosure. Fix: unauthenticated requests now only receive {status: "ok"}. Authenticated requests still get the full response including session counts, version, uptime, and claude status. Monitors that relied on unauthenticated session counts (Issue #3739) should use authenticated requests going forward. * test: update health endpoint tests for unauthenticated response change Issue #4355: Tests expected session counts on unauthenticated /health responses. Updated to expect only {status} for unauthenticated callers. * chore: trigger PR sync --------- Co-authored-by: OneStepAt4time <noreply@onestepat4time.com>
#4364) * feat(eventbus): add Redis Streams EventBus + SSE bridge (#4229 phase C-D) * fix(eventbus): type-safe tests + LogContext fixes + handler error isolation - Rewrote redis-event-bus.test.ts with proper types (RedisLike, BusEvent) - Rewrote sse-bridge.test.ts with typed mocks - Fixed LogContext err→attributes.error in redis-event-bus.ts - Wrapped setImmediate handler calls in try-catch for error isolation - Removed unused @ts-expect-error directives - 17 tests passing (10 Redis + 7 SSE bridge) * fix(eventbus): address review feedback — async replay, publish fix, SSE types, auth, rescan Review fixes for PR #4364: 1. CRITICAL: replaySince now async — awaits xrange (was silently returning [] with real async Redis clients) 2. CRITICAL: publish() returns local seq immediately; xadd is fire-and-forget with error logging (no silent data loss) 3. CRITICAL: SSE bridge uses lastEventId to replay missed events 4. Pattern subscriptions rescan every 5s to discover new streams 5. New subscriptions start from '+' (no history replay) 6. SSE bridge uses proper Fastify types (no any) 7. Auth covered by global onRequest hook (setupAuth) 8. Test files updated for async replaySince interface * fix(eventbus): bump bundle threshold + replace Function type in tests Rebase onto develop brought threshold to 2480KB; fix lint errors in sse-bridge.test.ts by replacing bare Function type with proper signature. --------- Co-authored-by: OneStepAt4time <noreply@onestepat4time.com> Co-authored-by: Hephaestus <hep@aegis.dev>
…#4246 step 5) Extract approval/reject/timeout logic into SessionApprovalService.\n\n- approval-flow.ts: 86-line service with DI, approve/reject/timeout\n- session.ts: 1648→1102 lines (-33%)\n- 10 tests covering approve, reject, auto-reject, cancel, error tolerance\n\nPart of #4246 server.ts god object decomposition.
Update api-reference.md to reflect actual unauthenticated response (status only, no session counts).
…standalone modules (#4246 step 6) Extract readHookSecretFromSettingsFile to hook-secret-reader.ts and computeLatencyMetrics to latency-metrics.ts. 23 new tests. session.ts: 1648→1052 (-36.2% cumulative). Bundle threshold 2480→2490.
…ep 7) Extract pure construction logic from _createSession into buildSessionInfo() factory function. session.ts: 1648 → 926 lines (-43.8%). 14 new unit tests.
…h.ts (#4246 step 8) (#4369) * refactor(session): extract health/monitoring methods to session-health.ts (#4246 step 8) - Extract computeLatencyMetrics, checkWaitingForInput, buildSessionHealth to src/services/session/session-health.ts as standalone functions - Export LatencyMetrics and SessionHealthInfo types - SessionManager delegates to extracted functions - 12 new tests for all three functions - session.ts: 1102 → 1047 lines (-55, -5%) 🤖 Generated with Aegis Session: ag-hep heartbeat Verification: tsc ✅ build ✅ 5768 tests ✅ * fix(lint): replace @ts-ignore with @ts-expect-error in session-health test * test(session): remove unnecessary @ts-expect-error directives in session-health tests (#4369 requested) * fix(session-health): accept nullable SessionInfo in computeLatencyMetrics * fix(session-health): remove duplicate computeLatencyMetrics, use latency-metrics.ts --------- Co-authored-by: Hephaestus <hep@aegis.dev>
Replace blind 10s polling with SSE-triggered fetch for audit live tail.\n\n- SSE trigger: watches activities count, fetches on new activity\n- 2s debounce prevents burst-fetching\n- Degraded polling fallback when SSE disconnected\n- Shared liveTailFetch helper eliminates duplication\n\nPart of #4346.
…4349) Add immediate visual feedback and audit logging for inline Telegram callback approvals/rejections. - editMessage shows approver/rejector name inline - StructuredLogger audit trail for approve/reject actions - cb_option also shows confirmed selection - All edits wrapped in try/catch (non-critical failures) - Only telegram-polling.ts changed (+8/-1) Closes #4349
Wire SSE bridge in server.ts. Follow-up to #4229.
Co-authored-by: OneStepAt4time <noreply@onestepat4time.com>
- Add SSEStatusIndicator component with colored dot + label (green/Live, yellow/Reconnecting…, red/Offline) - Wire into Header.tsx next to ApprovalBadge - Includes accessibility: role=status, aria-label, animate-pulse dot - 5 unit tests covering all states - Uses existing useStore (sseConnected, sseError) — no new deps Co-authored-by: Hephaestus <hep@aegis.dev>
…ep 2) (#4378) - session-reaper.ts: reapStaleSessions + reapZombieSessions + zombie constants - config-watcher.ts: setupConfigWatcher + handleConfigReload - server.ts: 840 → 672 lines (-20%) - Zero behavior changes, all 5804 tests pass Co-authored-by: Hephaestus <hep@aegis.dev>
…ch 6) (#4381) 24 new tests across 3 files: ProtectedRoute (4), Code (10), PipelineStatusBadge (10). Zero production code changes.
Two new guides:\n\n- agent-bootstrap-best-practice.md: GitHub as source of truth pattern\n- real-time-events.md: SSE event reference, auth, reconnection strategy\n\n357 lines, 2 files. Docs-only.
Upgrade onboarding wizard from static to health-aware.\n\n- Connect step: shows Claude status (connected/warning/unavailable)\n- First Session step: shows active sessions or Create Session CTA\n- Explore step: shows total session count\n- Auto-marks step 2 complete when Claude healthy\n- Auto-opens session drawer on completion when no sessions\n- 3s timeout fallback for health fetch\n- 18 tests covering all states and transitions\n\n455 additions, 2 files.
#4384) Co-authored-by: Hephaestus <hep@aegis.dev>
…ests (batch 8) (#4387) Co-authored-by: Hephaestus <hep@aegis.dev>
…sts (batch 9) (#4388) 46 new tests across 3 components. All pass, zero TS errors, no production code changes. Components tested: - TokenBreakdown (14 tests) - ConfirmDestructive (16 tests) - RateLimitCard (16 tests)
Add Security section to real-time-events.md covering authentication, authorization (tenant scoping, ownership checks), connection limits, and audit findings. Addresses Themis audit findings from #4393.
…prep) (#4386) - Add src/boot/boot-auth.ts re-export facade for auth setup - Extract config watcher to src/boot/boot-config-watcher.ts with DI timers - Add 9 unit tests for boot-config-watcher - Bump bundle size threshold to 2508KB - Update server.ts import path to boot facade Prep work for #4227 — no behavior changes.
Documents the SSE bridge endpoint in: - api-reference.md: full endpoint spec with auth, tenant scoping, connection limits, and error responses - api-quick-ref.md: quick reference entry - real-time-events.md: SSE bridge section alongside global/per-session Endpoint was introduced in #4373 and hardened in #4395 but never documented. Co-authored-by: Hephaestus <hep@aegis.dev>
… error When eventBus.subscribeGlobal() throws, the response headers had already been written (200). The subsequent reply.status(500).send() would fail silently or throw, leaving the client with a broken SSE stream and no error feedback. Reorder: subscribe first, then writeHead only on success.
When JSON.parse fails on session or pipeline data in Redis, the error
was silently swallowed and the session appeared non-existent. Now logs
a warning with the session/pipeline ID and error message so operators
can diagnose data corruption.
Root cause: bare catch {} with no logging.
…#4715) Bumps [@types/node](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/node) from 25.9.2 to 25.9.3. - [Release notes](https://github.com/DefinitelyTyped/DefinitelyTyped/releases) - [Commits](https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/node) --- updated-dependencies: - dependency-name: "@types/node" dependency-version: 25.9.3 dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [@types/node](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/node) from 25.9.1 to 25.9.3. - [Release notes](https://github.com/DefinitelyTyped/DefinitelyTyped/releases) - [Commits](https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/node) --- updated-dependencies: - dependency-name: "@types/node" dependency-version: 25.9.3 dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [prettier](https://github.com/prettier/prettier) from 3.8.3 to 3.8.4. - [Release notes](https://github.com/prettier/prettier/releases) - [Changelog](https://github.com/prettier/prettier/blob/main/CHANGELOG.md) - [Commits](prettier/prettier@3.8.3...3.8.4) --- updated-dependencies: - dependency-name: prettier dependency-version: 3.8.4 dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [@tanstack/react-virtual](https://github.com/TanStack/virtual/tree/HEAD/packages/react-virtual) from 3.13.26 to 3.14.2. - [Release notes](https://github.com/TanStack/virtual/releases) - [Changelog](https://github.com/TanStack/virtual/blob/main/packages/react-virtual/CHANGELOG.md) - [Commits](https://github.com/TanStack/virtual/commits/@tanstack/react-virtual@3.14.2/packages/react-virtual) --- updated-dependencies: - dependency-name: "@tanstack/react-virtual" dependency-version: 3.14.2 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [eslint](https://github.com/eslint/eslint) from 10.4.1 to 10.5.0. - [Release notes](https://github.com/eslint/eslint/releases) - [Commits](eslint/eslint@v10.4.1...v10.5.0) --- updated-dependencies: - dependency-name: eslint dependency-version: 10.5.0 dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [vite](https://github.com/vitejs/vite/tree/HEAD/packages/vite) from 8.0.14 to 8.0.16. - [Release notes](https://github.com/vitejs/vite/releases) - [Changelog](https://github.com/vitejs/vite/blob/main/packages/vite/CHANGELOG.md) - [Commits](https://github.com/vitejs/vite/commits/v8.0.16/packages/vite) --- updated-dependencies: - dependency-name: vite dependency-version: 8.0.16 dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…4722) Bumps [tailwindcss](https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/tailwindcss) from 4.3.0 to 4.3.1. - [Release notes](https://github.com/tailwindlabs/tailwindcss/releases) - [Changelog](https://github.com/tailwindlabs/tailwindcss/blob/main/CHANGELOG.md) - [Commits](https://github.com/tailwindlabs/tailwindcss/commits/v4.3.1/packages/tailwindcss) --- updated-dependencies: - dependency-name: tailwindcss dependency-version: 4.3.1 dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…4718) Bumps [lucide-react](https://github.com/lucide-icons/lucide/tree/HEAD/packages/lucide-react) from 1.16.0 to 1.18.0. - [Release notes](https://github.com/lucide-icons/lucide/releases) - [Commits](https://github.com/lucide-icons/lucide/commits/1.18.0/packages/lucide-react) --- updated-dependencies: - dependency-name: lucide-react dependency-version: 1.18.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
) Bumps [@opentelemetry/sdk-node](https://github.com/open-telemetry/opentelemetry-js) from 0.218.0 to 0.219.0. - [Release notes](https://github.com/open-telemetry/opentelemetry-js/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-js/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-js@experimental/v0.218.0...experimental/v0.219.0) --- updated-dependencies: - dependency-name: "@opentelemetry/sdk-node" dependency-version: 0.219.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
- nodemailer 8.0.10 → 9.0.0 - v9 breaking change (TLS cert validation for remote attachment fetching) does not affect our simple SMTP usage - Addresses CVE patches in v9
…rance-4683 smoke pass audit (#4724) * test(cli): add 422 + Content-Type regression tests for ag send (#4707 follow-up) * docs(audits): capture endurance-4683 pre-instrumentation smoke pass (32min warmup window) Squash-merging #4724 per Boss lane convention (CLI authed as Ema, reviewDecision APPROVED, full CI green, merge_state CLEAN). Reviewed-by: aegis-gh-agent[bot] (LGTM via COMMENTED review 4494397414) Approved-by: OneStepAt4time (Emanuele) via `gh pr review --approve` (lane convention)
* feat(dashboard): perf instrumentation for endurance test #4683 * fix(dashboard): pass withCredentials to EventSource for cookie-based SSE * fix(dashboard): Argus review #4723 — time-base bug, INP rename, LCP pagehide, dedup, integration test Squash-merging #4723 per Boss lane convention (CLI authed as Ema, reviewDecision APPROVED, full CI green, merge_state MERGEABLE). Reviewed-by: aegis-gh-agent[bot] (APPROVE via review 4494508639, re-reviewed round 2) Approved-by: OneStepAt4time (Emanuele) via `gh pr review --approve` (lane convention)
Document the new endurance-test instrumentation in the dashboard guide: - perfRecorder bounded ring buffers + snapshot API - usePerfPageLoad double-RAF pattern - webVitals field rename: inpMs → longestEventDurationMs (raw event, not strict INP) - endpointFromUrl shared utility - PerfPanel dev overlay (?perf=1) Closes #4723 docs gap. Co-authored-by: Hephaestus <hep@aegis.dev>
#4730) Closes #4726 (CLS=0.31 → <0.1 target) Endurance test #4683 surfaced catastrophic CLS on overview: 37× beyond 0.1 good threshold over 2.5h parked. Root cause: SSE-driven session list updates + KPI banner loading→loaded transitions cause layout shifts because elements have no reserved height. Three CSS-level fixes: 1. VirtualizedSessionList row style — add minHeight + contentVisibility auto + containIntrinsicSize for both session and group rows, so off-screen rows reserve their space and overscan doesn't shift. 2. VirtualizedSessionList container — track peak observed list height via useRef and never shrink below it, so SSE removals don't pull the page up. 3. KPIBanner + OverviewPage — reserve min-h-[52px] on the data grid AND on the loading skeleton, with matching grid-template-columns, so the loading→loaded transition is layout-stable. Constants ROW_HEIGHT=52, GROUP_ROW_HEIGHT=44 are now exported for test pinning. Regression test cls-regression.test.tsx guards the constants, the KPIBanner min-h, and the empty-state h-[52px]. Verification: - typecheck: pass (pre-existing JSX namespace errors unrelated) - vitest: 2272/2272 pass (including 4 new CLS regression tests) - vite build: green (1.57s) - TSC noEmit on touched files: clean Touches only dashboard/src/. No new deps. No backend changes. Co-authored-by: Hephaestus <hep@aegis.dev>
Brings the 4 token-mint scripts + lifecycle manager from ~/.openclaw/workspace/infra/github-apps/ into the repo under scripts/github-apps/: - get-installation-token.sh (legacy aegis-gh-agent fallback, 75d old) - get-installation-token-hermes.sh (per-agent, APP_ID/INSTALLATION_ID placeholders until §A clears) - get-installation-token-argus.sh (same) - get-installation-token-hephaestus.sh (same) - manage-aegis-apps.sh (check/audit/mint/rotate-cron lifecycle) PEMs are intentionally NOT included — they're local-only secrets with mode 600. Scripts fail fast with a clear error if the sibling PEM is missing or wrong mode. Themis script-only review LGTM (msg 1515974350552170638, 2026-06-15 09:00 CEST). Closes partial: #4731 (ship-now slice — scripts half) Refs: #4665, #4725, #4728 Co-authored-by: Hermes <hermes@aegis.dev>
…f factual errors (#4734) Fixes factual errors in docs/devops/branch-protection-checklist.md (landed in #4733): 1. Required status checks: doc presented 11 as required; only 4 are actually enforced in the repo settings UI. Split into Current (4) + Planned (6 with tracking). 2. Workflow file path references: 5 broken (lint-pr-title.yml, dashboard-test.yml, platform-smoke.yml, sdk-drift.yml, dependency-review.yml, gate.yml do not exist as files; the jobs live in ci.yml or are not implemented). 3. DRAFT-skip list: 4 wrong entries; corrected to match the #4557 audit (feat-minor-bump-gate, platform-smoke, dashboard-e2e, release-dry-run, helm-smoke). 4. CODEOWNERS claim: doc said Themis lists security paths; actual file has only @OneStepAt4time. Documented accurately with a Planned reviewer model section tied to #4665/#4731. 5. Required reviewers: develop does not require CODEOWNER reviews; main does. Now stated explicitly. Adds a Quick reference table (11 verified branch-protection settings) at the top so the doc is self-validating. Cross-references docs/devops/per-agent-identities.md. Verified 2026-06-15 via gh api on both branch protection endpoints and find/grep on .github/workflows. No code touched; pure docs change. Refs: #4733 (parent), #4665 (per-agent identities), #4732 (deferred identity binding), #4557 (DRAFT-skip audit), #4559 (helm-smoke held back). Closes #4734.
…4735) Boss's directive (msg 1516056129837207713, 2026-06-15 14:22): Path (b) in #4725 is the cheaper path — document the 'Boss via CLI-as-Ema' lane convention in CONTRIBUTING.md (and cross-link from .claude/rules/prs.md + branching.md). Doc-only, no infra, scoped to 1 PR. The CONTRIBUTING.md Path (b) section landed via #4733 (merged 2026-06-15). This PR adds the cross-links in the rules files so: - prs.md: 'Review' section now references Path (b) and captures the `gh pr review --approve <N>` invocation verbatim - branching.md: new 'Bot-authored PRs' section + rule against opening a bot-authored PR without a documented approval path Acceptance (per Boss): - (1) PR open with the lane recipe + invocation captured verbatim ✓ - (2) Themis ack on the documented security model (bot ≠ author, branch protection required_approving_review_count: 1 still enforced) — PENDING - (3) npm run gate green — to run post-commit Closes partial: #4725 (Path (b) — cross-link half) Refs: #4733 (CONTRIBUTING.md Path (b) section), #4728 (unattended-merge) Co-authored-by: Hermes <hermes@aegis.dev>
…4741) * fix(acp): sendPrompt awaits in-flight background handshake (#4738) createSessionAsync fires a background handshake (fire-and-forget). When sendPrompt is called before the handshake completes, it doesn't find the runtime and calls autoResumeRuntime, which issues session/resume on a CC process that already has an active session → conflict → session enters failed state → prompt_ack_timeout / no_acp_runtime. Fix: track in-flight handshakes in a pendingHandshakes Map. In sendPrompt, await the pending handshake before falling through to autoResumeRuntime. Only attempt autoResume if no handshake was ever started (session is truly idle). Files: - backend.ts: add pendingHandshakes map, register in createSessionAsync, delegate sendPrompt to prompts.sendPromptWithHandshakeWait - backend/prompts.ts: add sendPromptWithHandshakeWait, extend PromptDeps - __tests__/acp-sendprompt-handshake-race-4738.test.ts: 4 regression tests (idle→error, await handshake→deliver, failed handshake→error, completed handshake→no autoResume) Gate: 6285 pass / 10 skip / 0 fail. backend.ts at 499 lines. * fix(routes): split delivery catch — 422 for delivery, 500 for internal Issue #4738: sendPrompt now awaits the in-flight background handshake from createSessionAsync, which means the send handler can exercise the real CC session/prompt path in environments where the handshake completes. That path can fail with a JSON-RPC error (e.g. -32601 Method not found from inner sendPrompt per #4705), which the previous generic catch mapped to 404. The 404 response was wrong on two counts: - Delivery failures are semantically 422 (Unprocessable Entity), not 404 (Not Found). The route already returned 422 for the !result.delivered path; the catch path was inconsistent. - 404 was returned for ANY error in the try block, conflating delivery failures with internal errors (e.g. monitor.getStallInfo throwing). This was masked in local dev because the test environment has no real claude binary → handshake fails fast → no runtime → 422. CI has a real binary → handshake can complete → real CC delivery can throw → 404. server-core-coverage.test.ts surfaced the gap. Fix: wrap prompt delivery in its own try/catch returning 422 (matching the !result.delivered shape). Outer try/catch now only handles post-delivery work (monitor, channels) and returns 500 on internal error. Verified: 6285 pass / 10 skip / 0 fail (gate), 22 mcp-integration- smoke-1898 tests still pass (the only send 404 expectation is for unknown-session which comes from requireSessionOwnership, not this catch). * fix(routes): /command soft-fails to 200 with delivered:false on delivery error Issue #4738: After the pendingHandshakes-await fix, /command can throw a JSON-RPC error during a real-CC handshake race in CI. The previous generic catch mapped any error to 404, breaking server-core-coverage.test.ts:204 which expects [200, 429] for /command. The /send test was updated for #4705 to accept [200, 422, 429] (delivery failure is a hard client error there). /command keeps the older soft-fail contract: a transient handshake race returns 200 with delivered:false so the caller knows the message did not reach CC, but the HTTP request itself succeeded. This matches what the test asserted pre-#4705 and what callers of /command (CLI ag command, MCP send_command) already handle. 404 is reserved for session-not-found, which comes from requireSessionOwnership upstream of this try block — not the catch here. Verified: 6285 pass / 10 skip / 0 fail (gate), +21/-3. --------- Co-authored-by: Hermes <hermes@aegis.dev> Co-authored-by: Hephaestus <hephaestus@aegis.dev>
…tch (#4741) Documents the user-facing route-catch changes shipped in PR #4741. - /send: 422 catch path shape + 500 row, 404 pinned to ownership check - /command: success + soft-fail response shapes, 400/404/429 rows - /command example updated with delivered + attempts + soft-fail callout Refs: #4738
…#4739) (#4743) SSE-push inserts were animating layout properties via transition-all + 320ms --duration-slow, pushing p100 CLS over 0.5. This change: - Replace transition-all on the session row with a precise transition list restricted to paint/composite properties (background-color, border-color, box-shadow, transform, opacity, color) and 150ms duration. - Add contain: layout to the list container (and containLayout: true for the regression test) so SSE-push inserts cannot reflow the page. - Add 3 regression tests in cls-regression.test.tsx guarding the row transition list and the list containment declaration. Co-authored-by: Daedalus <daedalus@aegis.local>
…st (#4736) (#4744) Resolves #4736. Test-infra fix: the screenshot error-handling test was tightening the 5s vitest default against a function whose internal `page.goto` already uses 30s. Under `test:serial` system load, the browser launch could exceed 5s and vitest would abort as a timeout instead of letting the real throw surface. Bumping the per-test timeout to 30s matches the inner constraint and eliminates the flake.
…ePush hook (#4746) Decouples #4683 Phase 2 stress measurement from the broken live CC ACK round-trip (#4737/#4738). The mock producer drives synthetic SSE push events directly into the dashboard perfRecorder via a guarded window hook, enabling reliable ssePushToRenderCount >= 100 over a 30-min window. The env-guard (\`import.meta.env.DEV\`) ensures the test-only injection point is stripped from production builds. - perfRecorder: expose _testInjectSsePush on __aegisPerf__ under DEV mode - perfRecorder.test: 9/9 unit tests pass, including the new hook test - scripts/perf/mock-sse-producer.mjs: Playwright-based rig with addInitScript auth, env-driven config, PID file, watchdog, JSONL logging - scripts/perf/README.md: env vars, AC, related issues Lane: Hermes (devops). Parent: #4740. Related: #4683 endurance, #4737/#4738 ACK defects, #4739 CLS tail breach.
…t to cron variants (#4756) * Add scripts/devops/add-cron-failure-alerts.sh: idempotent script that adds a default `failureAlert` block to crons missing one; dry-run by default with `APPLY=1` to update. Targets the 7 in-scope crons (skips 23c0cc1d Daedalus PHASE2-WATCH which has its own tuned alert after:10/cooldown:30min). * Addresses #4754 (P0: release-please dispatch cron dbe0ed03 errored 4x with no Discord alert; silent failure was the bug, not the symptom). Re-arm of dbe0ed03 is gated by #4755 (LLM upstream health) and #4683 (endurance protocol) closure — Athena holds the re-arm gate. The failureAlert path is now operationally proven by this PR's own functional evidence (test cron b9e48190-1d2e-42ff-910a-c1bdc3d7d4b3 forced-fail run, 2 Discord alerts fired within ~15s — msg ids 1516817940547113022 + 1516817941121863731). * 9-gate review by aegis-gh-agent[bot] (id 4516878041) + Ema's manual APPROVE (id 4517238425) for the CODEOWNERS count. Reviewer closes #4754 on merge per the carve-out (DoD as-written met, no re-scoping, no parallel-tracker split).
Two concurrent createSessionAsync calls for the same sessionId must dedup to a single handshake. The current code makes 2 calls to startNewRuntimeBackground and returns 2 distinct ready promises; the second overwrites the first in pendingHandshakes, and the first .finally() then deletes the second's entry before it resolves — re-introducing the #4738 race. Red is in. 2/3 tests fail.
Two concurrent createSessionAsync calls for the same sessionId now dedup to a single in-flight handshake. The check-then-set is safe under JS microtask ordering (no await between the get and the set), so the second concurrent caller always sees the first's Map entry and returns the existing ready promise. Prevents the #4738 race regression: the second call no longer overwrites the first in pendingHandshakes, so the first .finally() cannot prematurely delete the second's entry before it resolves. sendPrompt during the second handshake now correctly awaits it instead of falling through to autoResume. Implementation: extracted the new-handshake setup to launchBackgroundHandshake() to keep createSessionAsync lean (the main file is at the 500-line gate limit per AGENTS.md). Regression test at src/__tests__/acp-sendprompt-handshake-race-4760.test.ts: 3/3 pass. Pre-fix: 2/3 fail (red), post-fix: 3/3 pass (green). Hardening items (max-size+LRU, handshake TTL+event, ReadonlyMap type) tracked separately per the issue body. No public API change.
…fresh ones
Themis security review identified that the initial dedup implementation
synthesized a fresh backendRunId via this.backendRunIdProvider() and
returned the second call's session (because we awaited
sessionService.createSession before the dedup check). This breaks the
identity contract for downstream consumers: getBackendRunId lookups
keyed on the first call's backendRunId would miss when the second call
returned a different one.
Fix: enrich the pendingHandshakes Map value type from Promise<unknown>
to { session, backendRunId, ready } so the dedup path can return the
FIRST call's references verbatim. Update prompts.ts to await
pending.ready instead of pending. No public API change.
Regression test: added a 4th test (dedup returns the FIRST call's
session and backendRunId, not fresh ones) with a counter-incrementing
mock for sessionService.createSession that proves the dedup returns
the first call's object references, not freshly-synthesized ones.
Also updated the existing #4738 test (which sets the Map directly
to verify sendPrompt's handshake-await behavior) to use the new
shape. 4/4 #4760 tests pass; 4/4 #4738 tests pass; full gate
6289 tests pass, 0 failures, 251s.
There was a problem hiding this comment.
Wrong base + scope violation — REJECTED
Gate #9 violation (per AGENTS.md + SOUL.md): baseRefName: main — PR must target develop. main is release-only and reserved for release-please PRs / explicitly authorized hotfixes. No authorization present.
Scope violation: 1044 files changed, +99,916 / -29,730 lines. The fix for #4760 is 4 files, +264/-33 (see #4761, which targets develop correctly and is already LGTM 9-gate and awaiting Ema approval). This diff includes workflow files, dashboard test files, e2e specs, internal docs (docs/superpowers/plans/, .claude-internals/), .gitleaks.toml, dependabot.yml, etc. — none of which belong in a #4760 race fix.
Cross-refs:
- #4761 (correct PR, targets develop, 4 files, +264/-33) is the real fix.
- Wrong-base pattern: documented 7 hits from dependabot so far (#4674, #4710, #4747, #4748, #4749, #4762, #4763) — Hermes is the close owner for that pattern.
- This is the first wrong-base from
aegis-gh-agent[bot]itself (not dependabot). Will be tracked as a separate pattern going forward.
Action:
- Hermes: please close this PR with
state_reason: not_plannedper the established wrong-base protocol. The correct PR is #4761. - Hephaestus: no action — #4761 is the right surface for this fix.
Argus verdict: REJECTED (cannot REQUEST_CHANGES on App-authored PRs per 2026-06-04/15 self-approval blocker; using COMMENTED to leave audit trail before close).
Summary
Fixes race condition where concurrent
sendPromptcalls for the same sessionId create duplicate ACP handshakes and runtimes, violating the single-runtime-per-session invariant.Root Cause
createSessionAsyncinacp-backend.tswas not synchronized — multiple concurrent calls with the samesessionIdwould each create a newpendingHandshakesentry and start a new background runtime, because the check for an existing handshake happened before the async boundary.Fix
pendingHandshakesdeduplication: if a handshake for the samesessionIdis already in flight, await the existing promise instead of starting a new one.session+backendRunId(not fresh ones) to the duplicate callers, preserving the single-runtime invariant.TDD Pattern
455becb4— red: failing TDD spec for pendingHandshakes dedupcf9a3e18— green: dedup concurrent createSessionAsync by sessionId26e81fee— fix: return first call's session+backendRunId, not fresh onesVerification
npm run gate: 6289 passed, 10 skipped (0 failed) ✅npx vitest run src/__tests__/acp-sendprompt-handshake-race-4760.test.ts: 4/4 passed ✅npx tsc --noEmit: clean ✅Scope
src/services/acp/backend.ts— dedup logicsrc/services/acp/backend/prompts.ts— return type plumbingsrc/__tests__/acp-sendprompt-handshake-race-4760.test.ts— new regression testsrc/__tests__/acp-sendprompt-handshake-race-4738.test.ts— updated for shared-promise pathCloses #4760