feat: 7-day default lookback for transcript streaming#1440
Conversation
e12849e to
dab082a
Compare
| // Apply lookback filter to subagent files | ||
| if let Some(cutoff) = lookback_cutoff { | ||
| let dominated = path | ||
| .metadata() | ||
| .ok() | ||
| .and_then(|m| m.modified().ok()) | ||
| .is_some_and(|mtime| mtime < cutoff); | ||
| if dominated { | ||
| continue; | ||
| } | ||
| } |
There was a problem hiding this comment.
🟡 Lookback filter in subagent sweep skips already-tracked sessions, causing potential data loss in release mode
In sweep_subagents_for_session, the lookback filter is applied to ALL subagent files before checking the DB, which means already-tracked sessions with old mtimes are silently skipped. This is inconsistent with the SweepCoordinator::run_sweep() at src/daemon/sweep_coordinator.rs:88-109, which correctly applies lookback only to new (untracked) sessions and always resumes processing for existing sessions.
In release mode, transcript_sweep defaults to false (src/feature_flags.rs:86), so neither the initial sweep nor periodic sweeps run. The only path for subagent processing is this checkpoint-triggered subagent sweep. If the daemon restarts and a subagent file was partially processed (watermark behind file content) but has an mtime older than the lookback window (default 7 days), the remaining unprocessed data will be permanently skipped with no fallback.
Scenario leading to data loss
- Subagent A is tracked in DB with watermark at 50KB
- Subagent A's file has 100KB, mtime = 2 weeks ago (subagent finished)
- Daemon restarts
- Main session checkpoint fires →
sweep_subagents_for_session→ lookback filter → 2 weeks > 7 days → SKIP - 50KB of unprocessed subagent transcript data is permanently lost
Was this helpful? React with 👍 or 👎 to provide feedback.
Limits transcript streaming sweep discovery to files modified within the last 7 days by default, reducing unnecessary I/O and memory usage for users with large transcript histories. Checkpoint-triggered (immediate) processing is unaffected by lookback. Configuration: - `transcript_streaming_lookback_days` in ~/.git-ai/config.json - `GIT_AI_TRANSCRIPT_STREAMING_LOOKBACK_DAYS` env var override - Set to 0 to disable lookback (process all history) - Default: 7 days Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Stores lookback_days instead of a pre-computed SystemTime cutoff so the rolling window remains accurate over long daemon lifetimes (the cutoff was previously fixed at construction time and would drift). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Already-tracked sessions with partial watermarks are always resumed regardless of file age, preventing data loss on daemon restart or delayed processing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
84927dd to
ad1bb16
Compare
Summary
Configuration
New config option
transcript_streaming_lookback_days(not a feature flag):~/.git-ai/config.json→"transcript_streaming_lookback_days": 30GIT_AI_TRANSCRIPT_STREAMING_LOOKBACK_DAYS=30Design decisions
mtimewhich is cheap (single stat syscall per file, no file reads)transcript_streamingfeature flag remainstrueby default in both debug and releaseTest plan
task lint && task fmtclean🤖 Generated with Claude Code