fix(reward): scoring pipeline dirty-scan and skip-path fixes by chiefmojo · Pull Request #1847 · MemTensor/MemOS

chiefmojo · 2026-06-01T01:51:57Z

Summary

Six targeted fixes to the dirty-episode recovery and scoring skip path. These close infinite rescore loops that appeared in companion infrastructure after the v2.0.5 regression fixes (PR #1784) landed.

1. Paginate dirty-closed episode scan

collectDirtyClosedEpisodes() replaces the fixed list({ limit: 500 }) call. Walks all pages so installations with >500 closed dirty episodes are fully recovered on startup, not silently truncated.

2. Set `r_task=0` on skipped episodes

When reward.ts skips an episode (too few exchanges, trivial content), it sets reward.skipped: true but returns without writing r_task. The episode stays in the dirty-scan queue and is re-queued on every startup pass. Fix: write r_task=0 on skip exit.

3. Clear `rewardDirty` flag on skip path

rewardDirty was only cleared on successful scoring. Skipped episodes kept the dirty flag and re-entered the scan on the next pass. Fix: clear rewardDirty on the skip path alongside setting r_task=0.

4. Drain reward after dirty-closed recovery in lightweight mode

recoverDirtyClosedEpisodes() emits episode.finalized and calls flush(). In lightweight mode flush() returns before draining the reward subscriber. Fix: explicitly call rewardRunner.run() for each recovered episode that remains dirty after flush.

5. Clear `rewardDirty` before recovery scoring

Recovery stamped closeReason and recoveryReason but did not clear rewardDirty. If the watchdog fires mid-scoring and leaves rTask null, the next startup re-picks the episode indefinitely. Fix: add rewardDirty: undefined to the updateMeta call before recovery scoring begins.

6. Prevent open-episode crash-respawn loop on watchdog interrupt

episodeRewardIsDirty() matched any episode with closeReason="finalized" and no rTask, including those already stamped with recoveryReason="dirty_reward_rescore" by recovery. Adds a guard: episodes already in a recovery path are excluded from the dirty scan.

Test plan

Fresh DB: startup recovery with 0 dirty episodes — no change in behavior
DB with >500 closed dirty episodes: all pages are walked, all are recovered
Episode with too few exchanges: scored once, r_task=0 written, not re-queued on next startup
Lightweight mode: recovery episodes are fully scored after flush()
Watchdog interrupt simulation: episode stamped with recoveryReason="dirty_reward_rescore" is not re-picked on next startup

🤖 Generated with Claude Code

The triviality-gate skip path wrote reward.skipped=true but never cleared rewardDirty from meta_json. Since episodeRewardIsDirty() checks the rewardDirty object flag before the skip gate, skipped episodes with the flag set would re-enter the dirty scan on every bridge restart, scoring and re-skipping indefinitely. Normal scoring path already had rewardDirty: undefined — this mirrors that pattern in the skip branch. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The skip path wrote reward.skipped=true but never called setRTask(), leaving r_task=NULL. For abandoned episodes episodeRewardIsDirty() falls through to the r_task==null check and returns true, causing those episodes to re-enter the dirty scan on every bridge start, get re-skipped, and loop indefinitely. Setting r_task=0 before updateMeta means the null-r_task branch in episodeRewardIsDirty() no longer fires, permanently clearing the episode from the dirty scan after its first skip. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

clampLimit() in _helpers.ts caps list() at 500 regardless of the limit argument passed. The prior limit:1000 change (87165daf) was a no-op — both values hit the same ceiling, leaving episodes beyond rank 500 permanently invisible to the dirty scan. Replace both scan sites (startup + periodic) with collectDirtyClosedEpisodes(), which paginates in 500-row pages until exhausted. All closed episodes are now covered regardless of total count. This was also the root cause of the "dirty-17" mystery: those episodes were at ranks 536-924, outside the 500-row window. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…mode recoverDirtyClosedEpisodes relied on flush() → reward.drain() to fire R_human scoring after the capture pass. flush() returns early in lightweight mode (the default), so the reward subscriber's 30 s timer was cancelled by shutdown() before it fired — leaving traceCount permanently mismatched and the episode dirty on every restart. Fix: after flush() drains the capture pass, explicitly call rewardRunner.run() for any episode that episodeRewardIsDirty() still considers dirty — mirroring the pattern already used by recoverOpenEpisodesAsSessionEnd. A second flush() then drains downstream (L2 / L3 / skills). Regression test: dirty-reward recovery does not insert orphan traces — seeded episode with traceCount=1 and 2 trace IDs (one having a tool call whose endedAt differs from the trace ts, which produces an orphan step in runReflect). Verifies that: 1. trace_ids_json stays at 2 after recovery (orphan insert guard). 2. traceCount is updated to 2 after the first recovery pass. 3. A second restart does not re-score the episode (loop stopped). Also fixes the pre-existing test "rescoring closed episodes when traces were appended after the last reward" which failed for the same reason. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…rash-respawn loop recoverDirtyClosedEpisodes() emits episode.finalized and awaits flush() which runs per-step capture reflection (potentially hundreds of LLM calls). If the daemon init watchdog fires (120 s) before flush() completes, the rewardDirty flag is never cleared by reward.ts — so the episode appears dirty on every subsequent startup and triggers the same scoring attempt, creating an infinite crash-respawn loop that hammers the configured LLM at ~5 500 calls/hour. Fix: clear rewardDirty in updateMeta before starting recovery. reward.ts already sets rewardDirty: undefined on successful scoring (idempotent); if the watchdog fires mid-scoring the flag is already gone, so the next startup finds the episode clean and init completes in milliseconds. Root cause of the incident: PR MemTensor#8's 120 s init watchdog (correct) combined with a large episode (254 traces, 238 per-step reflection calls, ~160 s) that had rewardDirty set from a follow_up reopen. The episode was never able to finish scoring within the watchdog window. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…terrupt Both recoverOpenEpisodesAsSessionEnd and recoverDirtyClosedEpisodes stamp recoveryReason=DIRTY_REWARD_RESCORE before emitting episode.finalized. The condition-4 guard in episodeRewardIsDirty now excludes episodes with this reason, so a watchdog-killed scoring run (rTask=null, closeReason= finalized) no longer re-triggers rescoring on every subsequent startup. Root cause: PR MemTensor#8's initWatchdog (120s default) interrupted scoring for episodes with 80+ steps (~130s). The episode remained rTask=null with closeReason=finalized — matching condition 4 exactly — and looped at ~30 restarts/hour consuming ~5,400 Qwen calls/hour. Fixes MemTensor#11 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chiefmojo and others added 6 commits May 31, 2026 17:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(reward): scoring pipeline dirty-scan and skip-path fixes#1847

fix(reward): scoring pipeline dirty-scan and skip-path fixes#1847
chiefmojo wants to merge 6 commits into
MemTensor:mainfrom
chiefmojo:pr/scoring-dirty-scan-skip-fixes

chiefmojo commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

chiefmojo commented Jun 1, 2026

Summary

1. Paginate dirty-closed episode scan

2. Set r_task=0 on skipped episodes

3. Clear rewardDirty flag on skip path

4. Drain reward after dirty-closed recovery in lightweight mode

5. Clear rewardDirty before recovery scoring

6. Prevent open-episode crash-respawn loop on watchdog interrupt

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

2. Set `r_task=0` on skipped episodes

3. Clear `rewardDirty` flag on skip path

5. Clear `rewardDirty` before recovery scoring