Feat/phase20 memory speak trust by moyunzero · Pull Request #9 · moyunzero/AetherLife

moyunzero · 2026-06-16T14:13:34Z

Summary by CodeRabbit

Release Notes

New Features
- Added incremental NPC reply streaming in the dialogue overlay, including streaming visual styling and live-region announcements.
Bug Fixes
- Improved Phase 20 memory recall robustness: safer handling of malformed recall inputs, more deterministic recall/no-memory behavior, and tighter password/nickname extraction.
- Enhanced movement/physical-action routing for additional “那边/那里/那儿” phrasings, including correct move injection when tool calls are empty.
Tests
- Expanded unit and E2E coverage for overlay streaming, recall merging, and hot snapshot refresh behavior.
Chores
- Added verification, benchmark, and cross-provider review/playtest scripts plus CI guardrails.

- Nickname refusal override case (RED) - Non-recall question leaves LLM reply unchanged Co-authored-by: Cursor <cursoragent@cursor.com>

- Add scripts/lib/dialogue-engage.mjs shared by verify-phase19 and benchmark - Replace drawer .message--thinking with Phase 19 overlay T_think/T_first selectors - Add B_recall case with optional seed; write speak-browser-latest.json Co-authored-by: Cursor <cursoragent@cursor.com>

- extract_nickname + format_recall_answer for 我叫/叫我 patterns - Add 叫什么 recall marker for SOLO-01 nickname questions - Social system prompt: direct answer when memory_summary present Co-authored-by: Cursor <cursoragent@cursor.com>

Cross-session password/nickname recall with overlay speak, refusal guard, latency smoke, and 60s single-retry on 429/timeout/5xx per D-13. Co-authored-by: Cursor <cursoragent@cursor.com>

Wire pnpm verify:phase20 npm script and diff triggers for recall_merge and phase 20 E2E scripts in agent-verify-map. Co-authored-by: Cursor <cursoragent@cursor.com>

Wait for corner-menu connected state, open/close shell drawer for memory callback, extend memory poll default, and fix consecutive-turn baseline detection so verify:phase20 and benchmark:speak-browser pass on Phase 19 shell. Co-authored-by: Cursor <cursoragent@cursor.com>

Malformed schedule zoneIds (e.g. village-square@v1 without :localId) no longer take down Colyseus during ambient ticks; pickZoneTarget falls back to the NPC position so verify:phase20 can complete. Co-authored-by: Cursor <cursoragent@cursor.com>

ISSUE-050: password/nickname recall picks player-seed rows, rejects token echo and ambiguous LLM replies, extends RECALL memory-context timeout, and adds regression tests. Adds OpenAI-compatible GSD review providers (Agnes + NVIDIA NIM) and speak SLA playtest scripts for Phase 20 gates. Co-authored-by: Cursor <cursoragent@cursor.com>

chatgpt-codex-connector · 2026-06-16T14:13:41Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

coderabbitai · 2026-06-16T14:13:59Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 97a1f1e0-6df0-4644-b20a-15ef04787369

📥 Commits

Reviewing files that changed from the base of the PR and between 7cfd787 and f76c7fa.

📒 Files selected for processing (11)

docs/ISSUE-LOG.md
packages/shared/src/speakIntent.test.ts
packages/shared/src/speakIntent.ts
scripts/verify-overlay-streaming.mjs
workers/agent-worker/src/graph/action_intent.py
workers/agent-worker/src/graph/npc_loop.py
workers/agent-worker/src/graph/recall_merge.py
workers/agent-worker/src/main.py
workers/agent-worker/tests/test_fetch_state_and_memory.py
workers/agent-worker/tests/test_recall_merge.py
workers/agent-worker/tests/test_speak_intent.py

✅ Files skipped from review due to trivial changes (1)

docs/ISSUE-LOG.md

🚧 Files skipped from review as they are similar to previous changes (5)

workers/agent-worker/tests/test_fetch_state_and_memory.py
workers/agent-worker/src/graph/action_intent.py
workers/agent-worker/src/main.py
workers/agent-worker/src/graph/npc_loop.py
workers/agent-worker/src/graph/recall_merge.py

📝 Walkthrough

Walkthrough

This PR fixes Phase 20 cross-session memory recall (ISSUE-050) through deterministic password/nickname selection and reply merging in recall_merge.py; adds recency augmentation and recall-specific timeouts in the NPC loop; implements incremental dialogue-overlay streaming UI; expands movement intent detection; optimizes casual NPC fast-lane response handling; introduces a GSD cross-AI plan review CLI tool; and hardens zone-wander against malformed zone IDs.

Changes

Phase 20 cross-session memory recall fix and worker integration

Layer / File(s)	Summary
recall_merge.py: detection, selection, merging rewrite `workers/agent-worker/src/graph/recall_merge.py`	Rewrites recall-question detection with expanded marker sets and interrogative cues (?, 吗); adds `pick_recall_memory` with password/nickname/fallback routing and recency/embed scoring; tightens `extract_password_answer` and `extract_nickname` to reject recall-question text; introduces `recall_no_memory_reply` for no-match cases; adds `format_recall_answer` specializations for "电脑密码", "门禁密码", and nickname phrasing; adds dialogue-turn augmentation helpers; refactors `merge_recall_into_reply` to use memory selection, cover-check, and no-memory guard.
npc_loop.py, memory/client.py: recency augmentation and recall tuning `workers/agent-worker/src/graph/npc_loop.py`, `workers/agent-worker/src/memory/client.py`	`load_memory_context` conditionally fetches and merges recent memories via `augment_retrieved_with_recent` and `augment_retrieved_with_dialogue_turns` when `needs_recency_augment` is true; `fetch_state_and_memory` selects recall-specific `memory_timeout` (18.0s) and `memory_attempts` (2) when `SpeakIntent.RECALL`; adds `_hot_worker_snapshot` cache with 2-second TTL for zero-HTTP hot fetches; `apply_tools` passes dialogue context to `inject_relative_move_tool` and persists updated snapshot.
memory_quote.py, llm_social_turn.py, main.py: player_message routing and social rules `workers/agent-worker/src/graph/memory_quote.py`, `workers/agent-worker/src/graph/nodes/llm_social_turn.py`, `workers/agent-worker/src/main.py`	`pick_memory_quote` gains `player_message` keyword arg and conditionally routes to `pick_recall_memory` when `is_recall_question` is true; `_build_social_messages` appends behavioral rule requiring direct answers (no refusal) for facts already in memory summary; `run_social_turn_llm` suppresses partial streaming on recall questions and passes `cfg.llm_social_max_tokens`; `process_job` records `t_job` timing, gates social fast-lane on `not player_requests_physical_action(player_message)`, passes `player_message` to `pick_memory_quote`, and stores `t_worker_total_ms`.
Python unit and integration tests for recall and memory `workers/agent-worker/tests/test_recall_merge.py`, `workers/agent-worker/tests/test_memory_quote.py`, `workers/agent-worker/tests/test_fetch_state_and_memory.py`, `workers/agent-worker/tests/test_tool_gate.py`	15+ new tests covering seed/disclosure, fact-token echo, password extraction variants, nickname rejection of recall questions, password priority over nickname embeddings, no-unrelated-fallback, empty-memory hallucination prevention, password determinism, recency preference, ambiguous multi-password resolution, player-seed preference, and no-memory guard; autouse fixture clears worker-snapshot cache; hot-cache, recall-timeout, and tool-injection assertions added.

Phase 20 E2E verification and playtest pipeline

Layer / File(s)	Summary
Shared Playwright helper libraries and framework `scripts/lib/dialogue-engage.mjs`, `scripts/lib/e2e-memory-helpers.mjs`, `scripts/lib/speak-browser-round.mjs`, `scripts/lib/speak-browser-stack.mjs`, `scripts/verify-phase19.mjs`	New modules for dialogue engagement (corner-menu/NPC-tab/canvas-fallback), memory-context polling, shell-drawer control, speak-round timing orchestration, Playwright loading with multi-source fallback, service health checks, room reset/readiness waits, and NPC position tracking. `verify-phase19` refactored to import shared `engageDialogue`.
verify-phase20.mjs: Phase 20 cross-session recall scenario `scripts/verify-phase20.mjs`, `scripts/lib/agent-verify-map.mjs`	E2E script enforcing no-mock policy, performing health checks, booting Playwright, writing two memory tokens via speak, polling server-side memory-context until tokens appear, reloading page, asserting recall response contains password/nickname without refusal, with single retry for retryable errors. `GF-10` golden-flow oracle entry registered.
playtest-speak-sla.mjs and benchmark updates `scripts/playtest-speak-sla.mjs`, `scripts/benchmark-speak-browser.mjs`, `scripts/lib/playtest-findings-patch.mjs`	Multi-session SLA playtest with configurable session count, turn pacing, retryable error classification, per-turn timing and subjective banding, screenshot capture, optional markdown findings patch, and chained `verify:phase20`. Benchmark adds `B_recall` case, overlay-first reply detection, conditional B4 execution with per-sequence error capture, `writeReport`/`printSummary` helpers. Findings-patch handles scorecard table generation and band-based note updates.
verify-overlay-streaming.mjs: overlay streaming metrics `scripts/verify-overlay-streaming.mjs`	E2E script measuring overlay streaming performance via `sendSpeakOverlay`, capturing `overlayPartialMs`, `firstTextMs`, `speakMs`, and `overlayStreamingBeforeDone`, with configurable OVERLAY_STREAMING_REQUIRE and OVERLAY_STREAMING_T_PARTIAL_MS verdicts.

Web UI streaming reply rendering and dialogue-overlay enhancements

Layer / File(s)	Summary
DialogueOverlay.tsx: streaming reply support `apps/web/src/components/DialogueOverlay.tsx`, `apps/web/src/ChatPage.tsx`, `apps/web/src/index.css`	DialogueOverlay gains optional `streamingReply` prop; derives `displayLine` preferring sanitized streaming text; renders with streaming CSS modifier, `data-testid="dialogue-overlay-streaming"`, `role="status"`, and `aria-live="polite"` when streaming; `showThinkingOnly` now depends on `displayLine` presence. ChatPage passes `streamingReply` prop. CSS class `.dialogue-overlay__last-line--streaming` added for streaming text color.
DialogueOverlay.test.ts: streaming and thinking UI `apps/web/src/components/DialogueOverlay.test.ts`	Vitest tests verify streaming reply displays streaming content (omitting thinking-only), streaming takes precedence over prior messages, and thinking placeholder shows when `streamingReply` is null and overlay is busy.

NPC behavior optimization: casual fast-lane short-circuit and social perception refinements

Layer / File(s)	Summary
casual_fast_lane.py: short-circuit optimization `workers/agent-worker/src/graph/casual_fast_lane.py`	Adds `_can_short_circuit_casual_lane` helper that returns false if player requests physical action; otherwise reconciles social perception and checks for skip. `run_casual_fast_lane` uses single `httpx.Client`, builds state early, and returns sanitized `preview.reply` when short-circuit condition is met, avoiding social event, collective refresh, tool, and composition steps.
social_turn.py, schemas.py: help-request refinement and field reordering `workers/agent-worker/src/collective/social_turn.py`, `workers/agent-worker/src/collective/schemas.py`	`infer_social_from_message` uses `_message_implies_help_request` helper that detects "帮" explicitly and "请" after stripping "回复"/"请假" to avoid false positives. `SocialTurnOut` reorders fields to place `reply` before `social` with docstring noting streaming alignment.
Social perception and JSON streaming tests `workers/agent-worker/tests/test_social_stream_extract.py`, `workers/agent-worker/tests/test_social_turn.py`, `workers/agent-worker/tests/test_llm_social_memory.py`, `workers/agent-worker/tests/test_casual_fast_lane.py`	Unit and integration tests for reply-first JSON streaming extraction, parsing, and `run_social_turn_llm` partial emissions; `test_infer_help_not_triggered_by_hui_fu_substring`; extended memory-summary assertions; `test_run_casual_fast_lane_short_circuits_zero_side_effect`; `test_apply_tools_injects_move_when_physical_and_tool_calls_empty`.

Movement and action intent pattern expansion

Layer / File(s)	Summary
Move pattern expansion and fallback tool injection `packages/shared/src/speakIntent.ts`, `packages/shared/src/speakIntent.test.ts`, `workers/agent-worker/src/graph/action_intent.py`, `workers/agent-worker/tests/test_action_intent.py`, `docs/ISSUE-LOG.md`	Adds "那边\|那里\|那儿" regex to MOVE_PATTERNS in both JS and Python; extends `resolve_npc_relative_move_cell` adjacency keyword matching. Tests extended to validate movement classification and `inject_relative_move_tool` resolution for relay-summon to different NPCs; `test_apply_tools_injects_move_when_physical_and_tool_calls_empty` verifies fallback injection when `tool_calls` is empty. ISSUE-LOG documents relay-movement guardrails.

LLM model configuration and constraints

Layer / File(s)	Summary
LLM factory and configuration updates `workers/agent-worker/src/llm/factory.py`, `workers/agent-worker/src/config.py`	`create_chat_model` adds optional `max_tokens` parameter; for non-cerebras providers, includes `max_tokens` in `ChatOpenAI` kwargs when provided and positive. `config.py` adds `llm_social_max_tokens` field (default 384) documented as social JSON length cap. `llm_social_turn.py` tightens `SOCIAL_SYSTEM_PROMPT` to enforce "output JSON only" and reply-before-social ordering; extends both system and human prompts with JSON-only and reply-first instructions.

GSD cross-AI plan review CLI tooling

Layer / File(s)	Summary
GSD OpenAI client and provider configuration `scripts/lib/gsd-openai-review.mjs`, `scripts/lib/gsd-review-providers-config.mjs`	`runOpenAiReview` with AbortController timeout and normalized results; `probeOpenAiProvider` for connectivity testing; `DEFAULT_OPENAI_PROVIDERS` for agnes/nvidia; `loadOpenAiProviders` merges config overrides; `resolveProviderRuntime`, `resolvePhaseDir`, and temp-path helpers.
GSD review merge and top-level CLI `scripts/lib/gsd-review-merge.mjs`, `scripts/gsd-review-providers.mjs`, `package.json`	`mergeProviderReviews` upserts per-provider sections into `-REVIEWS.md` and updates YAML frontmatter. CLI supports `--probe`, `--phase`, `--provider`, `--merge` modes. `gsd:review:` npm scripts added for provider review workflow.

Zone wander malformed ID safety

Layer / File(s)	Summary
findZone safe parsing and fallback test `apps/game-server/src/ambient/zone-wander.ts`, `apps/game-server/src/ambient/zone-wander.test.ts`	`findZone` wraps `parseZoneId` in `try/catch`, returns `undefined` on parse failure. New Vitest test asserts malformed `segment.zoneId` causes `pickZoneTarget` to fall back to NPC's current `(x, y)`.

Infrastructure, configuration, and documentation updates

Layer / File(s)	Summary
Refusal-markers parity check, ISSUE-LOG, and configuration `scripts/assert-refusal-markers-parity.mjs`, `docs/ISSUE-LOG.md`, `eslint.config.js`	CI script verifies JS and Python `REFUSAL_MARKERS` parity by parsing Python tuple assignment. ISSUE-LOG documents ISSUE-050 through ISSUE-054 entries covering cross-session recall failures, relay movement issues, overlay streaming flicker, and hot-snapshot refresh; refined Phase 20 guardrails. ESLint config expands files glob and adds browser/runtime globals as read-only.
package.json Phase 20 and GSD review scripts `package.json`	Adds `verify:phase20`, `verify:overlay-streaming`, `playtest:speak-sla`, `uat:phase20:playwright`, `gsd:review:probe`, `gsd:review:agnes`, `gsd:review:nvidia`, `gsd:review:providers` scripts.

Sequence Diagram(s)

sequenceDiagram
    participant E2E as verify-phase20.mjs
    participant Browser as Playwright
    participant UI as Web UI (Dialogue)
    participant GameServer as Game Server
    participant AgentWorker as Agent Worker
    participant MemoryDB as Memory Storage
    
    E2E->>GameServer: healthOk() GET /health
    E2E->>Browser: launch chromium → navigate room
    Browser->>UI: waitRoomReady() canvas visible
    E2E->>UI: engageDialogue(page)
    E2E->>UI: sendSpeakOverlay("密码是 sunbet42")
    UI->>AgentWorker: speak turn + player_message
    AgentWorker->>MemoryDB: store memory token
    E2E->>GameServer: waitForMemoryContext(needle="sunbet42")
    GameServer-->>E2E: memory context contains token
    E2E->>Browser: page.reload()
    E2E->>UI: engageDialogue(page)
    E2E->>UI: sendSpeakOverlay("门禁密码是多少？")
    UI->>AgentWorker: speak turn SpeakIntent.RECALL
    AgentWorker->>MemoryDB: load_memory_context (timeout=18s, attempts=2)
    MemoryDB-->>AgentWorker: retrieved memories + recency
    AgentWorker->>AgentWorker: pick_recall_memory → merge_recall_into_reply
    AgentWorker-->>UI: reply with "sunbet42"
    UI-->>E2E: NPC reply verified (no refusal)

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

moyunzero/AetherLife#3: Introduces the zone-wander pickZoneTarget and findZone module; this PR adds a try/catch safety wrapper to findZone and a regression test for malformed zone ID handling.
moyunzero/AetherLife#7: Introduces the DialogueOverlay component and Chat UI infrastructure; this PR adds streaming reply support via new streamingReply prop and corresponding UI state/rendering updates.

Poem

🐇 Hop, hop through memory lanes so long,
A password lost, now found and strong.
Once "I forget" upon recall—
Now sunbet42 answers all!
With streaming UI and zones made safe,
The rabbit's test suite seals the waif. 🌟

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 35.92% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'Feat/phase20 memory speak trust' is vague and does not clearly summarize the main change. It uses broad terminology without specific detail about what memory, speak, and trust improvements are being implemented.	Provide a more specific title that describes the concrete change, such as 'Add Phase 20 memory recall and overlay streaming improvements' or 'Implement deterministic recall merge and streaming UI for Phase 20'.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/phase20-memory-speak-trust

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 15

🧹 Nitpick comments (1)

scripts/benchmark-speak-browser.mjs (1)
136-145: 💤 Low value

Acceptable duplication for self-containment.

These selectors duplicate constants from e2e-memory-helpers.mjs. This is intentional to keep the benchmark script self-contained without importing the full helpers module. Consider adding a comment noting the canonical source to help maintainers keep them in sync.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/benchmark-speak-browser.mjs` around lines 136 - 145, The
THINKING_LOCATOR and OVERLAY_NPC_REPLY constants in the benchmark script
intentionally duplicate selectors from e2e-memory-helpers.mjs for
self-containment. Add a comment above these constant definitions documenting
that they are duplicated from the canonical source in e2e-memory-helpers.mjs and
noting that they should be kept in sync with the originals if either location is
updated.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/assert-refusal-markers-parity.mjs`:
- Around line 20-21: The regex pattern in the matchAll call at line 20 only
matches double-quoted strings, causing it to miss single-quoted Python literals.
Modify the regex pattern to capture both single-quoted and double-quoted strings
by updating it to match either quote type, then ensure the captured marker is
extracted correctly from whichever quote type was matched (accounting for the
fact that alternation with multiple capture groups will have one group undefined
depending on which pattern matched).

In `@scripts/gsd-review-providers.mjs`:
- Around line 33-40: Update the ESLint configuration file (eslint.config.js) to
properly cover all root-level script files. Modify the files glob pattern on
line 7 from the current narrow pattern that only covers scripts/lib/**/*.mjs and
scripts/agent-verify.mjs to the broader pattern ["scripts/**/*.mjs"] to include
all .mjs files in the scripts directory. Additionally, add AbortController:
"readonly" to the globals section (lines 11-19) to properly declare the
AbortController global that is used in scripts/verify-llm-models.mjs and
scripts/lib/gsd-openai-review.mjs.

In `@scripts/lib/agent-verify-map.mjs`:
- Around line 156-161: The triggers array for the GF-10 check (in the
configuration starting at line 156) is missing patterns for three Phase 20
helper surface files that directly affect Phase 20 E2E behavior. Add regex
patterns to the triggers array to include
`/scripts\/lib\/speak-browser-round\.mjs$/`,
`/scripts\/lib\/speak-browser-stack\.mjs$/`, and
`/scripts\/lib\/dialogue-engage\.mjs$/` so that changes to these helper files
will automatically select the verify:phase20 check.

In `@scripts/lib/gsd-openai-review.mjs`:
- Around line 91-93: The `await res.json()` call at line 91 in the
`runOpenAiReview` function can throw an exception when the response body is not
valid JSON (e.g., HTML returned by a proxy with HTTP 200 status), causing the
entire multi-provider run to crash. Wrap the res.json() call and the subsequent
content extraction logic in a try-catch block, and return a structured failure
object (matching the expected return type of the function) instead of allowing
the exception to propagate. This ensures that JSON parsing errors are handled
gracefully and do not terminate the provider loop in
scripts/gsd-review-providers.mjs.

In `@scripts/lib/gsd-review-merge.mjs`:
- Around line 16-18: Remove the unused variable declarations to resolve ESLint
no-unused-vars errors. In scripts/lib/gsd-review-merge.mjs at lines 16-18,
delete the entire paddedPrefix variable declaration that checks phaseNum and
performs string manipulation. At line 54, delete the heading variable
declaration. Both variables are unused and their removal will clear the static
analysis violations.
- Around line 134-145: The `reviewers_failed` section in the frontmatter becomes
stale because the code intentionally skips updating it when the block already
exists. Currently, the condition that checks for existing `reviewers_failed:`
with the regex test at line 138 only has a comment explaining why the merge is
skipped, but does not actually update the block. To fix this, replace the
existing `reviewers_failed:` block in the frontmatter with the new `failLines`
content whenever it's detected, rather than leaving it unchanged. This ensures
the list stays current with actual failures from each run, preventing stale
entries from persisting and ensuring new failures are captured. Apply the same
replacement logic (using the replace method with the pattern) in both the if and
else branches to keep the behavior consistent.

In `@scripts/lib/gsd-review-providers-config.mjs`:
- Around line 82-87: The code at lines 82-87 wraps the numeric values for
timeoutMs and maxTokens with Number() without validating the result, which can
produce NaN if the input values are invalid. After calling Number() on the
override and defaults values for both timeoutMs and maxTokens, add validation to
ensure the result is not NaN. If the conversion results in NaN, either throw a
descriptive error indicating invalid configuration or fall back to a valid
default value. This ensures that only valid numeric values are assigned to
timeoutMs and maxTokens, preventing downstream issues like immediate aborts from
setTimeout or invalid max_tokens values sent to external services.
- Around line 108-111: The code at lines 108-111 has a path traversal
vulnerability where the normalized phase parameter could contain traversal
segments like `../` that allow escaping from the `.planning/phases` directory.
When calling `resolve(phasesRoot, normalized)`, validate that the resulting
directory path stays within `phasesRoot` before accepting it. Use the
`relative()` function to check that the resolved path does not escape the root,
or verify that the resolved absolute path starts with the `phasesRoot` absolute
path, and only return the directory if this validation passes.

In `@scripts/lib/speak-browser-round.mjs`:
- Around line 32-40: The function `waitForFirstNpcReply` currently has a
signature mismatch with its caller in `scripts/lib/e2e-memory-helpers.mjs`. The
caller invokes it with only 2 arguments (page and timeoutMs) and expects a
timestamp return value, but the current function signature requires 3 parameters
(page, baseline, timeoutMs) and returns a string. Restore the correct contract
by removing the baseline parameter, capturing an initial baseline within the
function before the polling loop, and changing the return value from
`current.slice(0, 200)` to `Date.now()` so that the function returns the
timestamp when the first NPC reply change is detected.
- Around line 87-92: The performance.getEntriesByName("speak_partial") call is
checking all historical entries globally, which allows old entries from previous
rounds to incorrectly indicate this round had a partial response. Modify the
evaluation to filter the returned entries so that only those with timing values
that occurred after the start of this round (comparing against t0) are counted.
This ensures the partial check is scoped to the current round only and prevents
old entries from masking real first-reply timeouts.

In `@scripts/lib/speak-browser-stack.mjs`:
- Around line 27-30: The requireFromWeb function call on line 29 uses invalid
Node.js require() syntax by including a version specifier in the module string
format `@playwright/test@${ver}`. Node.js require() does not support version
specifiers in require strings. Remove the version specifier portion from the
module string passed to requireFromWeb, changing it to just require the base
module name without the @${ver} suffix. Alternatively, if the intention is to
handle different versions dynamically, consider a different approach that aligns
with Node.js module resolution, or verify whether this fallback mechanism is
viable given that `@playwright/test` is not listed as a dependency in
apps/web/package.json.

In `@scripts/playtest-speak-sla.mjs`:
- Line 86: The error message in the throw statement that guides users to install
playwright dependencies currently directs them to use `npm install`, but per
monorepo guidelines this should use `pnpm` instead. Update the error message
string to replace `npm install` with `pnpm` to align with the monorepo's package
management standards.
- Around line 47-52: The numeric environment variable parsing on these lines can
produce NaN values when environment variables contain invalid input (non-numeric
strings), which breaks control flow in retry loops and timeouts. Wrap each
Number.parseInt call with validation to ensure it returns a valid number,
providing a sensible default fallback when parsing fails. For variables like
retryBackoffMs, turnGapMs, and minSessionMs that currently lack Math.max()
guards, add explicit NaN checks or use a helper function that validates the
parsed result and returns the default value if NaN is produced. This ensures
speakAttempts, retryBackoffMs, sessionCount, turnGapMs, and minSessionMs all
receive valid numeric values regardless of invalid environment input.

In `@workers/agent-worker/src/graph/nodes/llm_social_turn.py`:
- Line 310: The string on line 310 in llm_social_turn.py triggers a RUF001 lint
warning due to confusable fullwidth comma characters (，) embedded in the Chinese
text. Replace all fullwidth commas in this string with regular ASCII commas (,)
to resolve the warning. The string beginning with "若玩家追问 Memory summary 中已有的事实"
contains at least two fullwidth commas that need conversion.

In `@workers/agent-worker/src/graph/recall_merge.py`:
- Around line 168-173: The extract_nickname() function currently extracts
nicknames from any memory text without distinguishing between question rows and
fact rows. When a question row like "player: 我叫什么？" is processed, it incorrectly
extracts "什么" as a nickname, causing pick_recall_memory() to select questions as
facts. Modify extract_nickname() to return None if the memory_text is a question
row (contains question markers like ？ or ?), ensuring nicknames are only
extracted from factual memory rows, not question rows.

---

Nitpick comments:
In `@scripts/benchmark-speak-browser.mjs`:
- Around line 136-145: The THINKING_LOCATOR and OVERLAY_NPC_REPLY constants in
the benchmark script intentionally duplicate selectors from
e2e-memory-helpers.mjs for self-containment. Add a comment above these constant
definitions documenting that they are duplicated from the canonical source in
e2e-memory-helpers.mjs and noting that they should be kept in sync with the
originals if either location is updated.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 544d5021-a6d7-45be-9943-39b698d34bd9

📥 Commits

Reviewing files that changed from the base of the PR and between f923111 and 2d35183.

📒 Files selected for processing (28)

apps/game-server/src/ambient/zone-wander.test.ts
apps/game-server/src/ambient/zone-wander.ts
docs/ISSUE-LOG.md
package.json
scripts/assert-refusal-markers-parity.mjs
scripts/benchmark-speak-browser.mjs
scripts/gsd-review-providers.mjs
scripts/lib/agent-verify-map.mjs
scripts/lib/dialogue-engage.mjs
scripts/lib/e2e-memory-helpers.mjs
scripts/lib/gsd-openai-review.mjs
scripts/lib/gsd-review-merge.mjs
scripts/lib/gsd-review-providers-config.mjs
scripts/lib/playtest-findings-patch.mjs
scripts/lib/speak-browser-round.mjs
scripts/lib/speak-browser-stack.mjs
scripts/playtest-speak-sla.mjs
scripts/verify-phase19.mjs
scripts/verify-phase20.mjs
workers/agent-worker/src/graph/memory_quote.py
workers/agent-worker/src/graph/nodes/llm_social_turn.py
workers/agent-worker/src/graph/npc_loop.py
workers/agent-worker/src/graph/recall_merge.py
workers/agent-worker/src/main.py
workers/agent-worker/src/memory/client.py
workers/agent-worker/tests/test_fetch_state_and_memory.py
workers/agent-worker/tests/test_memory_quote.py
workers/agent-worker/tests/test_recall_merge.py

coderabbitai · 2026-06-16T14:33:44Z

+  console.error(`Usage:
+  node scripts/gsd-review-providers.mjs --phase <N> [--provider agnes|nvidia|all] [--prompt-file PATH] [--merge] [--probe]
+
+Examples:
+  pnpm gsd:review:providers -- --phase 20 --merge
+  pnpm gsd:review:providers -- --phase 20 --provider nvidia
+  pnpm gsd:review:providers -- --probe`);
+  process.exit(1);


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== ESLint config files ==" fd -HI 'eslint.config.*' fd -HI '.eslintrc*' echo echo "== Node/global declarations in ESLint config ==" rg -n --hidden --iglob 'eslint.config.*' --iglob '.eslintrc*' 'env|globals|node|process|console|AbortController|files|scripts' echo echo "== Current static-analysis-reported symbols in changed scripts ==" rg -n '\b(process|console|AbortController)\b' scripts/gsd-review-providers.mjs scripts/lib/gsd-openai-review.mjs

Repository: moyunzero/AetherLife

Length of output: 5228

🏁 Script executed:

cat -n eslint.config.js

Repository: moyunzero/AetherLife

Length of output: 860

🏁 Script executed:

fd -maxdepth 1 '\.mjs$' scripts/

Repository: moyunzero/AetherLife

Length of output: 295

🏁 Script executed:

fd --max-depth 1 '\.mjs$' scripts/

Repository: moyunzero/AetherLife

Length of output: 1617

🏁 Script executed:

rg -l '\b(console|process|AbortController)\b' scripts/*.mjs | head -20

Repository: moyunzero/AetherLife

Length of output: 612

🏁 Script executed:

rg -l '\b(console|process|AbortController)\b' scripts/*.mjs | wc -l

Repository: moyunzero/AetherLife

Length of output: 67

🏁 Script executed:

rg -l '\bAbortController\b' scripts/*.mjs

Repository: moyunzero/AetherLife

Length of output: 94

Update ESLint config to cover all root-level script files and declare AbortController global.

The files glob in eslint.config.js line 7 is too narrow: it covers only scripts/lib/**/*.mjs and scripts/agent-verify.mjs, leaving 48+ other root-level .mjs scripts (including scripts/gsd-review-providers.mjs) uncovered. Additionally, AbortController is used in scripts/verify-llm-models.mjs and scripts/lib/gsd-openai-review.mjs but is not declared in the globals (lines 11–19).

Expand the files glob to ["scripts/**/*.mjs"] and add AbortController: "readonly" to the globals section.

🧰 Tools

🪛 ESLint

[error] 33-33: 'console' is not defined.

(no-undef)

[error] 40-40: 'process' is not defined.

(no-undef)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/gsd-review-providers.mjs` around lines 33 - 40, Update the ESLint configuration file (eslint.config.js) to properly cover all root-level script files. Modify the files glob pattern on line 7 from the current narrow pattern that only covers scripts/lib/**/*.mjs and scripts/agent-verify.mjs to the broader pattern ["scripts/**/*.mjs"] to include all .mjs files in the scripts directory. Additionally, add AbortController: "readonly" to the globals section (lines 11-19) to properly declare the AbortController global that is used in scripts/verify-llm-models.mjs and scripts/lib/gsd-openai-review.mjs.

Source: Linters/SAST tools

coderabbitai · 2026-06-16T14:33:46Z

-        system_text = f"{system_text}\n\nMemory summary:\n{memory}"
+        system_text = (
+            f"{system_text}\n\nMemory summary:\n{memory}\n"
+            "若玩家追问 Memory summary 中已有的事实，reply 须直接给出答案，勿拒绝或说「不记得」。"


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Resolve Ruff RUF001 on Line 310 by replacing the confusable fullwidth comma.

This string currently triggers RUF001; a plain ASCII comma removes the lint warning without changing behavior.

Proposed surgical diff

- "若玩家追问 Memory summary 中已有的事实，reply 须直接给出答案，勿拒绝或说「不记得」。" + "若玩家追问 Memory summary 中已有的事实, reply 须直接给出答案, 勿拒绝或说「不记得」。"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"若玩家追问 Memory summary 中已有的事实，reply 须直接给出答案，勿拒绝或说「不记得」。"

"若玩家追问 Memory summary 中已有的事实, reply 须直接给出答案, 勿拒绝或说「不记得」。"

🧰 Tools

🪛 Ruff (0.15.17)

[warning] 310-310: String contains ambiguous ， (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)

[warning] 310-310: String contains ambiguous ， (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@workers/agent-worker/src/graph/nodes/llm_social_turn.py` at line 310, The string on line 310 in llm_social_turn.py triggers a RUF001 lint warning due to confusable fullwidth comma characters (，) embedded in the Chinese text. Replace all fullwidth commas in this string with regular ASCII commas (,) to resolve the warning. The string beginning with "若玩家追问 Memory summary 中已有的事实" contains at least two fullwidth commas that need conversion.

Source: Linters/SAST tools

…nchmarks Harden speak latency path (casual fast lane, fetch_state cache, phaseTimingMs), fix relay move phrases and apply_tools physical inject (ISSUE-051/052), show speakPartial in DialogueOverlay, and make speak-browser benchmark resilient to timeouts and room reset failures. Co-authored-by: Cursor <cursoragent@cursor.com>

coderabbitai

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

workers/agent-worker/src/main.py (1)
290-294: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use original payload message as fallback for recall quote routing.

At Line 293, pick_memory_quote() only receives result.get("player_message"). If a fast-lane result omits this field, recall detection won’t trigger and quote selection can regress to non-recall behavior.
Proposed surgical diff
     memory_quote = pick_memory_quote(
         result.get("retrieved_memories"),
         int(result.get("memory_count") or 0),
-        player_message=str(result.get("player_message") or ""),
+        player_message=str(result.get("player_message") or player_message or ""),
     )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@workers/agent-worker/src/main.py` around lines 290 - 294, The
pick_memory_quote function call is passing player_message from the result
without a fallback mechanism, which causes recall detection to fail when
fast-lane results omit this field. Modify the player_message parameter passed to
pick_memory_quote to use the original payload message as a fallback when
result.get("player_message") is not available, ensuring recall behavior works
consistently regardless of whether the fast-lane result includes the
player_message field.
scripts/benchmark-speak-browser.mjs (1)
132-132: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Switch the install hint to pnpm.

Line 132 still recommends npm install, which is inconsistent with monorepo command conventions.

As per coding guidelines, "Use pnpm instead of npm or yarn for all package management commands in the monorepo."
Suggested patch
-    throw new Error("playwright 未安装：cd scripts/.pw-deps && npm install");
+    throw new Error("playwright 未安装：cd scripts/.pw-deps && pnpm install");
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/benchmark-speak-browser.mjs` at line 132, The error message in the
throw statement at line 132 recommends using npm install, which violates the
monorepo convention of using pnpm for all package management commands. Replace
the npm install reference with pnpm install in the error message to align with
project standards.
Source: Coding guidelines

🧹 Nitpick comments (4)

apps/web/src/components/DialogueOverlay.test.ts (1)

27-68: ⚡ Quick win

Add assertions for streaming ARIA contract (role="status" + aria-live="polite").

These attributes were added in the component but are not validated in tests, so accessibility regressions could slip through.

Suggested test diff

   it("shows streaming text instead of thinking-only placeholder", () => {
     const html = renderToStaticMarkup(
       createElement(DialogueOverlay, {
         ...baseProps,
         streamingReply: "你好呀，今天想聊点什么？",
       }),
     );
     expect(html).toContain('data-testid="dialogue-overlay-streaming"');
+    expect(html).toContain('role="status"');
+    expect(html).toContain('aria-live="polite"');
+    expect(html).toContain("dialogue-overlay__last-line--streaming");
     expect(html).toContain("你好呀，今天想聊点什么？");
     expect(html).not.toContain("dialogue-overlay__thinking");
   });

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/web/src/components/DialogueOverlay.test.ts` around lines 27 - 68, The
tests for DialogueOverlay component are not validating the ARIA accessibility
attributes (role="status" and aria-live="polite") that should be present on the
streaming dialogue overlay element. Add assertions in the test cases that
involve streamingReply content (specifically in "shows streaming text instead of
thinking-only placeholder" and "prefers streaming partial over stale last
message while thinking") to verify that the rendered HTML contains both
role="status" and aria-live="polite" attributes on the streaming element to
ensure the accessibility contract is maintained and prevent future regressions.

workers/agent-worker/tests/test_action_intent.py (1)

222-234: ⚡ Quick win

Add one negative boundary case for “那里/那边/那儿” to prevent regressions.

These new assertions are all positive paths. Please add a non-move sentence (e.g., “那里有什么故事？”) asserting player_requests_move(...) is False and intent is not PHYSICAL.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@workers/agent-worker/tests/test_action_intent.py` around lines 222 - 234, The
test cases shown all verify positive paths where movement is requested. Add a
negative boundary case to prevent regressions by including a test assertion for
a sentence that contains the location words "那里/那边/那儿" but does NOT request
movement (such as "那里有什么故事？"). After creating this non-move test message
variable, assert that player_requests_move() returns False for this message, and
assert that classify_speak_intent() returns a value other than
SpeakIntent.PHYSICAL, similar to how the existing positive test cases verify
these conditions for their respective messages.

workers/agent-worker/tests/test_casual_fast_lane.py (1)

83-107: ⚡ Quick win

Add a physical-action non-short-circuit regression test.

This new test only locks the short-circuit path. Add one case where player_requests_physical_action(...) is true and assert the pipeline does not short-circuit (i.e., social/tool stages run).

Suggested diff

+def test_run_casual_fast_lane_physical_request_does_not_short_circuit():
+    preview = _preview_for("你好，用一句话简短回复")
+    settings = Settings(game_server_url="http://127.0.0.1:2567")
+
+    with patch("src.graph.casual_fast_lane.fetch_state") as fetch_state:
+        fetch_state.return_value = {"room_id": "default", "room_snapshot": {"npcs": []}}
+        with patch("src.graph.casual_fast_lane.player_requests_physical_action", return_value=True):
+            with patch("src.graph.casual_fast_lane.apply_social_event", side_effect=lambda s: s) as apply_social:
+                with patch("src.graph.casual_fast_lane.apply_tools", side_effect=lambda s, **_: s) as apply_tools:
+                    with patch("src.graph.casual_fast_lane.compose_reply", side_effect=lambda s: {**s, "reply": "ok"}):
+                        with patch("src.graph.job_context.record_phase_ms"):
+                            run_casual_fast_lane(
+                                room_id="default",
+                                player_message="请你走到那边",
+                                npc_id="npc-1",
+                                player_id="p1",
+                                recent_turns=[],
+                                preview=preview,
+                                settings=settings,
+                            )
+    apply_social.assert_called_once()
+    apply_tools.assert_called_once()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@workers/agent-worker/tests/test_casual_fast_lane.py` around lines 83 - 107,
Add a new test function to verify the non-short-circuit regression case for when
physical actions are requested. Create a test that sets up the same mocks and
calls to run_casual_fast_lane as the existing
test_run_casual_fast_lane_short_circuits_zero_side_effect, but configure the
preview or player state to make player_requests_physical_action return true. In
this new test, assert that apply_social and apply_tools ARE called (not
assert_not_called), demonstrating that the pipeline proceeds through the normal
stages instead of short-circuiting when a physical action is requested.

workers/agent-worker/tests/test_social_turn.py (1)

130-133: ⚡ Quick win

Cover the 请假 normalization branch explicitly.

The new helper strips both 回复 and 请假, but this test only verifies the 回复 case. Add a 请假 assertion to lock the second false-positive guard.

Suggested diff

 def test_infer_help_not_triggered_by_hui_fu_substring():
     assert infer_social_from_message("你好，用一句话简短回复") is None
-    assert infer_social_from_message("请帮帮我") is not None
-    assert infer_social_from_message("请帮帮我").kind == "help"
+    assert infer_social_from_message("我今天请假") is None
+    inferred = infer_social_from_message("请帮帮我")
+    assert inferred is not None
+    assert inferred.kind == "help"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@workers/agent-worker/tests/test_social_turn.py` around lines 130 - 133, The
test function test_infer_help_not_triggered_by_hui_fu_substring currently only
verifies the 回复 substring normalization case but does not cover the 请假 substring
normalization case. Add a new assertion in this test function that calls
infer_social_from_message with a message containing 请假 to verify that this
substring is also properly normalized and does not trigger false-positive help
detection, ensuring both false-positive guards are properly tested.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@eslint.config.js`:
- Around line 7-14: Add the missing script file `scripts/verify-phase20.mjs` to
the `files` array in the ESLint override configuration at the location shown in
the diff. This script uses globals like `AbortSignal` and `localStorage` that
need the override rules to avoid triggering `no-undef` errors. Simply include
the string "scripts/verify-phase20.mjs" as an additional entry in the files
array along with the other listed scripts.

In `@packages/shared/src/speakIntent.ts`:
- Line 56: The regex pattern at line 56 matching `/那边|那里|那儿/` is too broad and
matches plain location references without movement context, causing incorrect
PHYSICAL intent classification. The pattern should be refined to only match
these location terms when they appear in contexts that indicate actual movement
or directional intent, such as after motion verbs or directional markers. Update
the regex to require contextual indicators (like preceding directional or
movement-related characters/patterns) before matching these location terms, so
that standalone queries like "那里有什么历史？" are not misclassified as movement
intents at the PHYSICAL classification stage around line 115.

In `@scripts/benchmark-speak-browser.mjs`:
- Around line 149-150: The ROOM_ID variable is being interpolated directly into
URL path segments without proper encoding, which can cause issues if it contains
special characters like `/`, `?`, or `#`. Wrap the ROOM_ID with
encodeURIComponent() when interpolating it into the fetch URL template literals.
This applies to both the fetch call in the reset endpoint (line 149) and the
other occurrence mentioned at line 160, ensuring the ROOM_ID is properly
URL-encoded before being used in the path.

In `@scripts/lib/speak-browser-stack.mjs`:
- Around line 46-48: The error message in the throw statement indicates users
should run npm install, but this conflicts with the monorepo's package
management standards. In the error message within the throw new Error call,
replace the text "npm install" with "pnpm install" to align with the monorepo
convention of using pnpm for all package management commands.

In `@scripts/verify-overlay-streaming.mjs`:
- Around line 53-63: The browser cleanup is deferred too long, leaving the
Chromium process open if chromium.launch(), newContext(), addInitScript(), or
newPage() throws an error. Restructure the code by moving the try-finally block
to immediately wrap everything starting after chromium.launch(), so that the
finally block containing browser.close() always executes regardless of which
operation fails. This ensures the Chromium process is properly cleaned up even
if any initialization step throws an exception.
- Line 31: The variable tPartialWarnMs is parsed from the
OVERLAY_STREAMING_T_PARTIAL_MS environment variable without validation, which
means invalid values like "abc" result in NaN instead of failing fast. After
parsing the environment variable with Number.parseInt in the tPartialWarnMs
assignment, add a validation check to ensure the result is a valid number (not
NaN), and throw or exit with an error message if the environment variable
contains an invalid value, preventing silent failures where NaN disables
downstream warning comparisons.
- Around line 29-32: The file `scripts/verify-overlay-streaming.mjs` is not
included in the ESLint configuration's `files` array in `eslint.config.js`,
causing no-undef errors for Node.js globals like `process`. Update the
`eslint.config.js` file to add `scripts/verify-overlay-streaming.mjs` (or an
appropriate glob pattern to cover all scripts) to the `files` array in the
configuration object that handles Node.js environment globals, ensuring the
ESLint rules properly recognize process and other Node.js built-in globals in
this file.

In `@workers/agent-worker/src/graph/action_intent.py`:
- Line 26: The regex pattern containing the standalone Chinese location words
`那边|那里|那儿` on line 26 is too permissive when used with `re.search()` in the
`player_requests_move` function (line 54), causing it to match non-movement
sentences and incorrectly classify them as move intents. Add word boundary
constraints or contextual patterns to these Chinese location words so they only
match when they are actually part of a movement action request, keeping the
implementation consistent with the shared TypeScript matcher to reduce false
positives.

---

Outside diff comments:
In `@scripts/benchmark-speak-browser.mjs`:
- Line 132: The error message in the throw statement at line 132 recommends
using npm install, which violates the monorepo convention of using pnpm for all
package management commands. Replace the npm install reference with pnpm install
in the error message to align with project standards.

In `@workers/agent-worker/src/main.py`:
- Around line 290-294: The pick_memory_quote function call is passing
player_message from the result without a fallback mechanism, which causes recall
detection to fail when fast-lane results omit this field. Modify the
player_message parameter passed to pick_memory_quote to use the original payload
message as a fallback when result.get("player_message") is not available,
ensuring recall behavior works consistently regardless of whether the fast-lane
result includes the player_message field.

---

Nitpick comments:
In `@apps/web/src/components/DialogueOverlay.test.ts`:
- Around line 27-68: The tests for DialogueOverlay component are not validating
the ARIA accessibility attributes (role="status" and aria-live="polite") that
should be present on the streaming dialogue overlay element. Add assertions in
the test cases that involve streamingReply content (specifically in "shows
streaming text instead of thinking-only placeholder" and "prefers streaming
partial over stale last message while thinking") to verify that the rendered
HTML contains both role="status" and aria-live="polite" attributes on the
streaming element to ensure the accessibility contract is maintained and prevent
future regressions.

In `@workers/agent-worker/tests/test_action_intent.py`:
- Around line 222-234: The test cases shown all verify positive paths where
movement is requested. Add a negative boundary case to prevent regressions by
including a test assertion for a sentence that contains the location words
"那里/那边/那儿" but does NOT request movement (such as "那里有什么故事？"). After creating
this non-move test message variable, assert that player_requests_move() returns
False for this message, and assert that classify_speak_intent() returns a value
other than SpeakIntent.PHYSICAL, similar to how the existing positive test cases
verify these conditions for their respective messages.

In `@workers/agent-worker/tests/test_casual_fast_lane.py`:
- Around line 83-107: Add a new test function to verify the non-short-circuit
regression case for when physical actions are requested. Create a test that sets
up the same mocks and calls to run_casual_fast_lane as the existing
test_run_casual_fast_lane_short_circuits_zero_side_effect, but configure the
preview or player state to make player_requests_physical_action return true. In
this new test, assert that apply_social and apply_tools ARE called (not
assert_not_called), demonstrating that the pipeline proceeds through the normal
stages instead of short-circuiting when a physical action is requested.

In `@workers/agent-worker/tests/test_social_turn.py`:
- Around line 130-133: The test function
test_infer_help_not_triggered_by_hui_fu_substring currently only verifies the 回复
substring normalization case but does not cover the 请假 substring normalization
case. Add a new assertion in this test function that calls
infer_social_from_message with a message containing 请假 to verify that this
substring is also properly normalized and does not trigger false-positive help
detection, ensuring both false-positive guards are properly tested.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: a20e7207-69b2-423f-9a83-644036f862b4

📥 Commits

Reviewing files that changed from the base of the PR and between 2d35183 and ee7ab4d.

📒 Files selected for processing (37)

apps/web/src/ChatPage.tsx
apps/web/src/components/DialogueOverlay.test.ts
apps/web/src/components/DialogueOverlay.tsx
apps/web/src/index.css
docs/ISSUE-LOG.md
eslint.config.js
package.json
packages/shared/src/speakIntent.ts
scripts/assert-refusal-markers-parity.mjs
scripts/benchmark-speak-browser.mjs
scripts/lib/agent-verify-map.mjs
scripts/lib/e2e-memory-helpers.mjs
scripts/lib/gsd-openai-review.mjs
scripts/lib/gsd-review-merge.mjs
scripts/lib/gsd-review-providers-config.mjs
scripts/lib/speak-browser-round.mjs
scripts/lib/speak-browser-stack.mjs
scripts/playtest-speak-sla.mjs
scripts/verify-overlay-streaming.mjs
workers/agent-worker/src/collective/schemas.py
workers/agent-worker/src/collective/social_turn.py
workers/agent-worker/src/config.py
workers/agent-worker/src/graph/action_intent.py
workers/agent-worker/src/graph/casual_fast_lane.py
workers/agent-worker/src/graph/nodes/llm_social_turn.py
workers/agent-worker/src/graph/npc_loop.py
workers/agent-worker/src/graph/recall_merge.py
workers/agent-worker/src/llm/factory.py
workers/agent-worker/src/main.py
workers/agent-worker/tests/test_action_intent.py
workers/agent-worker/tests/test_casual_fast_lane.py
workers/agent-worker/tests/test_fetch_state_and_memory.py
workers/agent-worker/tests/test_llm_social_memory.py
workers/agent-worker/tests/test_recall_merge.py
workers/agent-worker/tests/test_social_stream_extract.py
workers/agent-worker/tests/test_social_turn.py
workers/agent-worker/tests/test_tool_gate.py

✅ Files skipped from review due to trivial changes (3)

workers/agent-worker/src/config.py
apps/web/src/index.css
docs/ISSUE-LOG.md

🚧 Files skipped from review as they are similar to previous changes (9)

scripts/lib/gsd-openai-review.mjs
scripts/lib/agent-verify-map.mjs
scripts/lib/speak-browser-round.mjs
package.json
scripts/lib/gsd-review-merge.mjs
scripts/lib/e2e-memory-helpers.mjs
workers/agent-worker/src/graph/npc_loop.py
scripts/lib/gsd-review-providers-config.mjs
workers/agent-worker/src/graph/recall_merge.py

coderabbitai · 2026-06-17T14:15:35Z

+    files: [
+      "scripts/lib/**/*.mjs",
+      "scripts/agent-verify.mjs",
+      "scripts/gsd-review-providers.mjs",
+      "scripts/assert-refusal-markers-parity.mjs",
+      "scripts/playtest-speak-sla.mjs",
+      "scripts/benchmark-speak-browser.mjs",
+    ],


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Include scripts/verify-phase20.mjs in this ESLint override.

That script uses AbortSignal and localStorage, but this files set doesn’t match it, so it can still hit no-undef under default rules.

Suggested surgical diff

files: [ "scripts/lib/**/*.mjs", "scripts/agent-verify.mjs", + "scripts/verify-phase20.mjs", "scripts/gsd-review-providers.mjs", "scripts/assert-refusal-markers-parity.mjs", "scripts/playtest-speak-sla.mjs", "scripts/benchmark-speak-browser.mjs", ],

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

files: [

"scripts/lib/**/*.mjs",

"scripts/agent-verify.mjs",

"scripts/gsd-review-providers.mjs",

"scripts/assert-refusal-markers-parity.mjs",

"scripts/playtest-speak-sla.mjs",

"scripts/benchmark-speak-browser.mjs",

],

files: [

"scripts/lib/**/*.mjs",

"scripts/agent-verify.mjs",

"scripts/verify-phase20.mjs",

"scripts/gsd-review-providers.mjs",

"scripts/assert-refusal-markers-parity.mjs",

"scripts/playtest-speak-sla.mjs",

"scripts/benchmark-speak-browser.mjs",

],

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eslint.config.js` around lines 7 - 14, Add the missing script file `scripts/verify-phase20.mjs` to the `files` array in the ESLint override configuration at the location shown in the diff. This script uses globals like `AbortSignal` and `localStorage` that need the override rules to avoid triggering `no-undef` errors. Simply include the string "scripts/verify-phase20.mjs" as an additional entry in the files array along with the other listed scripts.

coderabbitai · 2026-06-17T14:15:36Z

+    const res = await fetch(`${GS}/rooms/${ROOM_ID}/reset`, { method: "POST", headers });
+    lastStatus = res.status;


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Encode ROOM_ID before interpolating it into path URLs.

Line 149 and Line 160 use raw ROOM_ID in path segments. Since ROOM_ID is env-driven, reserved characters (e.g. /, ?, #) can break route matching.

Suggested patch

- const res = await fetch(`${GS}/rooms/${ROOM_ID}/reset`, { method: "POST", headers }); + const res = await fetch(`${GS}/rooms/${encodeURIComponent(ROOM_ID)}/reset`, { method: "POST", headers }); @@ - const res = await fetch(`${GS}/rooms/${ROOM_ID}/state`, { headers }); + const res = await fetch(`${GS}/rooms/${encodeURIComponent(ROOM_ID)}/state`, { headers });

Also applies to: 160-160

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/benchmark-speak-browser.mjs` around lines 149 - 150, The ROOM_ID variable is being interpolated directly into URL path segments without proper encoding, which can cause issues if it contains special characters like `/`, `?`, or `#`. Wrap the ROOM_ID with encodeURIComponent() when interpolating it into the fetch URL template literals. This applies to both the fetch call in the reset endpoint (line 149) and the other occurrence mentioned at line 160, ensuring the ROOM_ID is properly URL-encoded before being used in the path.

coderabbitai · 2026-06-17T14:15:36Z

+  throw new Error(
+    "playwright not installed — run: cd scripts/.pw-deps && npm install",
+  );


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use pnpm in the Playwright install hint.

Line 47 still tells users to run npm install; this conflicts with monorepo command conventions.

As per coding guidelines, "Use pnpm instead of npm or yarn for all package management commands in the monorepo."

Suggested patch

throw new Error( - "playwright not installed — run: cd scripts/.pw-deps && npm install", + "playwright not installed — run: cd scripts/.pw-deps && pnpm install", );

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/lib/speak-browser-stack.mjs` around lines 46 - 48, The error message in the throw statement indicates users should run npm install, but this conflicts with the monorepo's package management standards. In the error message within the throw new Error call, replace the text "npm install" with "pnpm install" to align with the monorepo convention of using pnpm for all package management commands.

Source: Coding guidelines

coderabbitai · 2026-06-17T14:15:36Z

+const roomId = process.env.OVERLAY_STREAMING_ROOM_ID || `overlay-stream-${runId}`;
+const speakTimeoutMs = Math.max(90_000, e2eSpeakTimeoutMs());
+const tPartialWarnMs = Number.parseInt(process.env.OVERLAY_STREAMING_T_PARTIAL_MS || "15000", 10);
+const requireStreaming = process.env.OVERLAY_STREAMING_REQUIRE === "1";


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check if the file exists and inspect its structure git ls-files | grep -E "(verify-overlay-streaming|eslint)" | head -20

Repository: moyunzero/AetherLife

Length of output: 118

🏁 Script executed:

# Find the file and check its existence fd "verify-overlay-streaming" --type f

Repository: moyunzero/AetherLife

Length of output: 101

🏁 Script executed:

# Find ESLint config files fd "eslint" --type f

Repository: moyunzero/AetherLife

Length of output: 81

🏁 Script executed:

# Check for ESLint-related configs in root ls -la | grep -E "eslint|\.eslintrc"

Repository: moyunzero/AetherLife

Length of output: 198

🏁 Script executed:

# Inspect the ESLint config cat eslint.config.js

Repository: moyunzero/AetherLife

Length of output: 1077

🏁 Script executed:

# Inspect the actual script file cat -n scripts/verify-overlay-streaming.mjs | head -140

Repository: moyunzero/AetherLife

Length of output: 5744

Add scripts/verify-overlay-streaming.mjs to ESLint configuration for Node.js globals coverage.

The file is missing from the ESLint config's files array. Without it, ESLint will report no-undef errors for process, console, and localStorage since these are not defined in the default configuration.

Update eslint.config.js to include this script in the covered files list:

Configuration fix

files: [ "scripts/lib/**/*.mjs", "scripts/agent-verify.mjs", "scripts/gsd-review-providers.mjs", "scripts/assert-refusal-markers-parity.mjs", "scripts/playtest-speak-sla.mjs", "scripts/benchmark-speak-browser.mjs", + "scripts/verify-overlay-streaming.mjs", ],

🧰 Tools

🪛 ESLint

[error] 29-29: 'process' is not defined.

(no-undef)

[error] 31-31: 'process' is not defined.

(no-undef)

[error] 32-32: 'process' is not defined.

(no-undef)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/verify-overlay-streaming.mjs` around lines 29 - 32, The file `scripts/verify-overlay-streaming.mjs` is not included in the ESLint configuration's `files` array in `eslint.config.js`, causing no-undef errors for Node.js globals like `process`. Update the `eslint.config.js` file to add `scripts/verify-overlay-streaming.mjs` (or an appropriate glob pattern to cover all scripts) to the `files` array in the configuration object that handles Node.js environment globals, ensuring the ESLint rules properly recognize process and other Node.js built-in globals in this file.

Source: Linters/SAST tools

coderabbitai · 2026-06-17T14:15:36Z

+const runId = Date.now();
+const roomId = process.env.OVERLAY_STREAMING_ROOM_ID || `overlay-stream-${runId}`;
+const speakTimeoutMs = Math.max(90_000, e2eSpeakTimeoutMs());
+const tPartialWarnMs = Number.parseInt(process.env.OVERLAY_STREAMING_T_PARTIAL_MS || "15000", 10);


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate OVERLAY_STREAMING_T_PARTIAL_MS before using it.

A typo such as OVERLAY_STREAMING_T_PARTIAL_MS=abc makes tPartialWarnMs become NaN, which disables the late-partial warning comparison and writes a misleading partialWithinWarn: false verdict instead of failing fast on bad config.

🛡️ Proposed config parser

-const tPartialWarnMs = Number.parseInt(process.env.OVERLAY_STREAMING_T_PARTIAL_MS || "15000", 10); +const tPartialWarnMs = parseNonNegativeIntEnv("OVERLAY_STREAMING_T_PARTIAL_MS", 15_000); const requireStreaming = process.env.OVERLAY_STREAMING_REQUIRE === "1"; @@ /** `@param` {number | null} a `@param` {number | null} b */ function deltaMs(a, b) { if (a == null || b == null) return null; return b - a; } + +function parseNonNegativeIntEnv(name, fallback) { + const raw = process.env[name]; + if (raw == null || raw === "") return fallback; + const value = Number.parseInt(raw, 10); + if (!Number.isFinite(value) || value < 0) { + throw new Error(`${name} must be a non-negative integer, got ${JSON.stringify(raw)}`); + } + return value; +}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const tPartialWarnMs = Number.parseInt(process.env.OVERLAY_STREAMING_T_PARTIAL_MS || "15000", 10);

const tPartialWarnMs = parseNonNegativeIntEnv("OVERLAY_STREAMING_T_PARTIAL_MS", 15_000);

const requireStreaming = process.env.OVERLAY_STREAMING_REQUIRE === "1";

/** `@param` {number | null} a `@param` {number | null} b */

function deltaMs(a, b) {

if (a == null || b == null) return null;

return b - a;

}

function parseNonNegativeIntEnv(name, fallback) {

const raw = process.env[name];

if (raw == null || raw === "") return fallback;

const value = Number.parseInt(raw, 10);

if (!Number.isFinite(value) || value < 0) {

throw new Error(`${name} must be a non-negative integer, got ${JSON.stringify(raw)}`);

}

return value;

}

🧰 Tools

🪛 ESLint

[error] 31-31: 'process' is not defined.

(no-undef)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/verify-overlay-streaming.mjs` at line 31, The variable tPartialWarnMs is parsed from the OVERLAY_STREAMING_T_PARTIAL_MS environment variable without validation, which means invalid values like "abc" result in NaN instead of failing fast. After parsing the environment variable with Number.parseInt in the tPartialWarnMs assignment, add a validation check to ensure the result is a valid number (not NaN), and throw or exit with an error message if the environment variable contains an invalid value, preventing silent failures where NaN disables downstream warning comparisons.

Merge in-session dialogue turns into recall so food preferences are available before async memory tail persists. Harden food extraction and relay summon patterns. Skip LLM stream partials on recall questions and emit merged reply once to fix overlay flicker (ISSUE-053). Co-authored-by: Cursor <cursoragent@cursor.com>

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

workers/agent-worker/src/graph/npc_loop.py (1)

205-211: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Refresh the hot snapshot cache after applied actions.

fetch_state() can skip HTTP for 3 seconds using _stale_worker_snapshots, but apply_tools() returns the game-server’s updated state without caching it. A second speak right after a move can therefore plan from the pre-action snapshot.

🐛 Proposed fix

-    body = safe_response_json(res)
+    body = safe_response_json(res)
+    updated_snapshot = body.get("state")
+    if not isinstance(updated_snapshot, dict):
+        updated_snapshot = state.get("room_snapshot", {}) or {}
+    _remember_worker_snapshot(room_id, player_id, updated_snapshot)
     record_phase_ms("t_apply_ms", int((time.perf_counter() - t0) * 1000))
     return {
         **state,
-        "room_snapshot": body.get("state", state.get("room_snapshot", {})),
+        "room_snapshot": updated_snapshot,
         "pending_actions": actions,
     }

Also applies to: 677-683

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@workers/agent-worker/src/graph/npc_loop.py` around lines 205 - 211, The issue
is that apply_tools() returns updated game-server state but does not refresh the
hot snapshot cache, causing subsequent fetch_state() calls to continue using
stale pre-action snapshots from _stale_worker_snapshots for up to 3 seconds.
After apply_tools() returns the updated state from the game-server, refresh the
cached snapshot in _stale_worker_snapshots (or whichever cache storage mechanism
is used) with the new state so that the next fetch_state() call returns the
fresh post-action snapshot instead of a stale one. Apply the same fix in the
second location mentioned around lines 677-683.

workers/agent-worker/src/graph/recall_merge.py (1)

216-223: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Filter incompatible password topics, not just rank them lower.

If the user asks for a computer password and only a door-password row is retrieved, it remains a candidate and can be formatted as 电脑密码是 ...; door-lock prompts have the symmetric issue because topic handling only special-cases 电脑.

🐛 Proposed fix

+def _password_topic(text: str) -> str | None:
+    if "电脑" in text:
+        return "computer"
+    if "门锁" in text or "门禁" in text:
+        return "door"
+    return None
+
+
 def _password_topic_score(player_message: str, memory_text: str) -> int:
-    if "电脑" not in player_message:
+    asked = _password_topic(player_message)
+    if asked is None:
         return 0
-    if "电脑" in memory_text or "电脑密码" in memory_text:
+    remembered = _password_topic(memory_text)
+    if remembered == asked:
         return 2
-    if "门锁" in memory_text or "门禁" in memory_text:
-        return 0
+    if remembered is not None:
+        return -1
     return 1

     if not candidates:
         return None
+    explicit_topic = _password_topic(player_message)
+    if explicit_topic is not None:
+        candidates = [
+            item
+            for item in candidates
+            if _password_topic_score(player_message, _memory_item_text(item)) > 0
+        ]
+        if not candidates:
+            return None
 
     def sort_key(item: dict[str, Any]) -> tuple[int, int, int, float]:

Also applies to: 354-360

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@workers/agent-worker/src/graph/recall_merge.py` around lines 216 - 223, The
_password_topic_score function currently ranks incompatible password topics
lower instead of filtering them out completely. When a user asks for a computer
password ("电脑") but only door-lock related memory ("门锁" or "门禁") is retrieved,
it returns 1 instead of 0, keeping it as a candidate. Refactor this function to
return 0 (incompatible/filtered out) when password topic types are mismatched
between the player message and memory text, ensuring symmetric handling for both
computer and door-lock password topic pairs. Check if the player message
indicates one type of password topic and the memory text indicates a different
incompatible type, then filter it out by returning 0.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@workers/agent-worker/src/graph/recall_merge.py`:
- Line 51: The `_FOOD_RECALL_RE` regex pattern currently matches both
first-person and second-person food questions, and it is only evaluated after
the top-level recall guard, causing first-person prompts like `我喜欢吃什么？` to be
missed. Refine the regex pattern to exclude second-person variations (e.g., ones
containing `你`) by making it more specific to only first-person food
preferences, then integrate the `_FOOD_RECALL_RE` check into the
`is_recall_question` function so that first-person food recall questions are
properly routed through the deterministic food merging logic earlier in the
flow.
- Around line 142-145: The issue is that the food-specific branch with
_is_food_recall_question() returns before applying the min_score filter,
allowing below-threshold food items to be selected while other item types would
be rejected. Move the score filtering logic (the line that creates the scored
list by filtering items where score >= min_score) to execute before the
_is_food_recall_question check, so that _pick_food_memory receives
already-filtered items that meet the min_score threshold, ensuring consistent
filtering behavior across all code paths.

---

Outside diff comments:
In `@workers/agent-worker/src/graph/npc_loop.py`:
- Around line 205-211: The issue is that apply_tools() returns updated
game-server state but does not refresh the hot snapshot cache, causing
subsequent fetch_state() calls to continue using stale pre-action snapshots from
_stale_worker_snapshots for up to 3 seconds. After apply_tools() returns the
updated state from the game-server, refresh the cached snapshot in
_stale_worker_snapshots (or whichever cache storage mechanism is used) with the
new state so that the next fetch_state() call returns the fresh post-action
snapshot instead of a stale one. Apply the same fix in the second location
mentioned around lines 677-683.

In `@workers/agent-worker/src/graph/recall_merge.py`:
- Around line 216-223: The _password_topic_score function currently ranks
incompatible password topics lower instead of filtering them out completely.
When a user asks for a computer password ("电脑") but only door-lock related
memory ("门锁" or "门禁") is retrieved, it returns 1 instead of 0, keeping it as a
candidate. Refactor this function to return 0 (incompatible/filtered out) when
password topic types are mismatched between the player message and memory text,
ensuring symmetric handling for both computer and door-lock password topic
pairs. Check if the player message indicates one type of password topic and the
memory text indicates a different incompatible type, then filter it out by
returning 0.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: f7995973-f174-4ec3-b2af-554e8edfe234

📥 Commits

Reviewing files that changed from the base of the PR and between ee7ab4d and 7cfd787.

📒 Files selected for processing (9)

docs/ISSUE-LOG.md
workers/agent-worker/src/graph/action_intent.py
workers/agent-worker/src/graph/nodes/llm_social_turn.py
workers/agent-worker/src/graph/npc_loop.py
workers/agent-worker/src/graph/recall_merge.py
workers/agent-worker/tests/test_action_intent.py
workers/agent-worker/tests/test_recall_merge.py
workers/agent-worker/tests/test_social_stream_extract.py
workers/agent-worker/tests/test_tool_gate.py

🚧 Files skipped from review as they are similar to previous changes (1)

workers/agent-worker/src/graph/action_intent.py

coderabbitai · 2026-06-17T16:29:02Z

+    r"(?:最喜欢|最爱|喜欢|爱)吃([^。，,.!?；;\s～~吗呢吧啊？?]+)",
+    re.IGNORECASE,
+)
+_FOOD_RECALL_RE = re.compile(r"喜欢吃什么|爱吃什么|最爱吃|口味|爱吃什么", re.IGNORECASE)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Route first-person food recall through is_recall_question.

_FOOD_RECALL_RE is only used after the top-level recall guard, so prompts like 我喜欢吃什么？ never reach deterministic food merging. Tighten the regex so it does not match second-person seed turns like 我喜欢吃西瓜，你喜欢吃什么？, then include it in is_recall_question.

🐛 Proposed fix

-_FOOD_RECALL_RE = re.compile(r"喜欢吃什么|爱吃什么|最爱吃|口味|爱吃什么", re.IGNORECASE) +_FOOD_RECALL_RE = re.compile( + r"(?:我|我的).{0,8}(?:最喜欢|最爱|喜欢|爱)吃什么|(?:我|我的).{0,8}口味", + re.IGNORECASE, +) def is_recall_question(message: str) -> bool: msg = message.strip() if not msg: return False if any(marker in msg for marker in _DISCLOSURE_MARKERS): return False + if _FOOD_RECALL_RE.search(msg) and _has_question_cue(msg): + return True if any(marker in msg for marker in _STRONG_RECALL_MARKERS): return True

Also applies to: 68-78

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@workers/agent-worker/src/graph/recall_merge.py` at line 51, The `_FOOD_RECALL_RE` regex pattern currently matches both first-person and second-person food questions, and it is only evaluated after the top-level recall guard, causing first-person prompts like `我喜欢吃什么？` to be missed. Refine the regex pattern to exclude second-person variations (e.g., ones containing `你`) by making it more specific to only first-person food preferences, then integrate the `_FOOD_RECALL_RE` check into the `is_recall_question` function so that first-person food recall questions are properly routed through the deterministic food merging logic earlier in the flow.

coderabbitai · 2026-06-17T16:29:03Z

+    msg = player_message.strip()
+    if _is_food_recall_question(msg):
+        return _pick_food_memory(items, min_score=min_score)
+    scored = [item for item in items if float(item.get("score") or 0) >= min_score]


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Apply min_score before food-specific selection.

The food branch returns before the shared score gate, so a below-threshold food-looking row can answer recall even though password/nickname rows would be rejected.

🐛 Proposed fix

"""Pick the retrieved row that matches the recall question type, not only top score.""" items = retrieved_memories or [] msg = player_message.strip() + scored = [item for item in items if float(item.get("score") or 0) >= min_score] if _is_food_recall_question(msg): - return _pick_food_memory(items, min_score=min_score) - scored = [item for item in items if float(item.get("score") or 0) >= min_score] + return _pick_food_memory(scored, min_score=min_score) if not scored: return None

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@workers/agent-worker/src/graph/recall_merge.py` around lines 142 - 145, The issue is that the food-specific branch with _is_food_recall_question() returns before applying the min_score filter, allowing below-threshold food items to be selected while other item types would be rejected. Move the score filtering logic (the line that creates the scored list by filtering items where score >= min_score) to execute before the _is_food_recall_question check, so that _pick_food_memory receives already-filtered items that meet the min_score threshold, ensuring consistent filtering behavior across all code paths.

…sword recall CodeRabbit PR #9 fixes: apply_tools now updates worker hot cache from apply-actions state; password recall rejects door-lock rows for computer questions; contextual location regex avoids misclassifying narrative questions. Verified with pytest, agent:verify, verify:phase20, and verify:overlay-streaming. Co-authored-by: Cursor <cursoragent@cursor.com>

moyunzero and others added 8 commits June 16, 2026 12:01

test(20-02): add failing SOLO-01 nickname recall merge tests

5760280

- Nickname refusal override case (RED) - Non-recall question leaves LLM reply unchanged Co-authored-by: Cursor <cursoragent@cursor.com>

feat(20-03): add verify-phase20 E2E memory helpers and gate script

3807f18

Cross-session password/nickname recall with overlay speak, refusal guard, latency smoke, and 60s single-retry on 429/timeout/5xx per D-13. Co-authored-by: Cursor <cursoragent@cursor.com>

chore(20-03): register verify:phase20 and agent golden flow GF-10

34c73d6

Wire pnpm verify:phase20 npm script and diff triggers for recall_merge and phase 20 E2E scripts in agent-verify-map. Co-authored-by: Cursor <cursoragent@cursor.com>

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 17, 2026

View reviewed changes

moyunzero merged commit 7733608 into main Jun 18, 2026
2 checks passed

coderabbitai Bot mentioned this pull request Jun 21, 2026

feat(phase21): world echo lore drawer, verify gate, and memory recall fixes #10

Merged

6 tasks

	"若玩家追问 Memory summary 中已有的事实，reply 须直接给出答案，勿拒绝或说「不记得」。"
	"若玩家追问 Memory summary 中已有的事实, reply 须直接给出答案, 勿拒绝或说「不记得」。"

		const res = await fetch(`${GS}/rooms/${ROOM_ID}/reset`, { method: "POST", headers });
		lastStatus = res.status;

-const tPartialWarnMs = Number.parseInt(process.env.OVERLAY_STREAMING_T_PARTIAL_MS || "15000", 10);
+const tPartialWarnMs = parseNonNegativeIntEnv("OVERLAY_STREAMING_T_PARTIAL_MS", 15_000);
+const requireStreaming = process.env.OVERLAY_STREAMING_REQUIRE === "1";
+/** `@param` {number | null} a `@param` {number | null} b */
+function deltaMs(a, b) {
+  if (a == null || b == null) return null;
+  return b - a;
+}
+function parseNonNegativeIntEnv(name, fallback) {
+  const raw = process.env[name];
+  if (raw == null || raw === "") return fallback;
+  const value = Number.parseInt(raw, 10);
+  if (!Number.isFinite(value) || value < 0) {
+    throw new Error(`${name} must be a non-negative integer, got ${JSON.stringify(raw)}`);
+  }
+  return value;
+}

Conversation

moyunzero commented Jun 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

chatgpt-codex-connector Bot commented Jun 16, 2026

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

moyunzero commented Jun 16, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading