Skip to content

feat: 7-day default lookback for transcript streaming#1440

Open
svarlamov wants to merge 3 commits into
feat/vsc-copilot-token-usage-otelfrom
feat/transcript-streaming-default-lookback
Open

feat: 7-day default lookback for transcript streaming#1440
svarlamov wants to merge 3 commits into
feat/vsc-copilot-token-usage-otelfrom
feat/transcript-streaming-default-lookback

Conversation

@svarlamov
Copy link
Copy Markdown
Member

@svarlamov svarlamov commented May 25, 2026

Summary

  • Adds a configurable lookback period (default: 7 days) to transcript streaming sweep discovery
  • Files not modified within the lookback window are skipped during periodic sweeps, reducing I/O and memory usage
  • Checkpoint-triggered (immediate) processing is unaffected — active sessions are always processed regardless of lookback
  • Subagent transcript discovery also respects the lookback filter

Configuration

New config option transcript_streaming_lookback_days (not a feature flag):

  • Config file: ~/.git-ai/config.json"transcript_streaming_lookback_days": 30
  • Env var: GIT_AI_TRANSCRIPT_STREAMING_LOOKBACK_DAYS=30
  • Default: 7 days
  • Disable: Set to 0 to process all history (no lookback limit)
  • Precedence: env var > config file > default (7)

Design decisions

  • Applied at the sweep coordinator level (discovery time) for maximum efficiency — old files never get opened or read
  • Fail-open: if a file cannot be stat'd, it's assumed to be within the window
  • Uses file mtime which is cheap (single stat syscall per file, no file reads)
  • The transcript_streaming feature flag remains true by default in both debug and release

Test plan

  • Unit tests for lookback filtering in sweep coordinator (4 tests)
  • Unit tests for config resolution (3 tests: default, env override, zero=unlimited)
  • All 260 transcript-related tests pass
  • All 3029+ integration tests pass (1 known flaky unrelated)
  • task lint && task fmt clean

🤖 Generated with Claude Code


Open in Devin Review

devin-ai-integration[bot]

This comment was marked as resolved.

@svarlamov svarlamov force-pushed the feat/transcript-streaming-default-lookback branch from e12849e to dab082a Compare May 25, 2026 20:05
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 4 additional findings in Devin Review.

Open in Devin Review

Comment on lines +387 to +397
// Apply lookback filter to subagent files
if let Some(cutoff) = lookback_cutoff {
let dominated = path
.metadata()
.ok()
.and_then(|m| m.modified().ok())
.is_some_and(|mtime| mtime < cutoff);
if dominated {
continue;
}
}
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot May 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Lookback filter in subagent sweep skips already-tracked sessions, causing potential data loss in release mode

In sweep_subagents_for_session, the lookback filter is applied to ALL subagent files before checking the DB, which means already-tracked sessions with old mtimes are silently skipped. This is inconsistent with the SweepCoordinator::run_sweep() at src/daemon/sweep_coordinator.rs:88-109, which correctly applies lookback only to new (untracked) sessions and always resumes processing for existing sessions.

In release mode, transcript_sweep defaults to false (src/feature_flags.rs:86), so neither the initial sweep nor periodic sweeps run. The only path for subagent processing is this checkpoint-triggered subagent sweep. If the daemon restarts and a subagent file was partially processed (watermark behind file content) but has an mtime older than the lookback window (default 7 days), the remaining unprocessed data will be permanently skipped with no fallback.

Scenario leading to data loss
  1. Subagent A is tracked in DB with watermark at 50KB
  2. Subagent A's file has 100KB, mtime = 2 weeks ago (subagent finished)
  3. Daemon restarts
  4. Main session checkpoint fires → sweep_subagents_for_session → lookback filter → 2 weeks > 7 days → SKIP
  5. 50KB of unprocessed subagent transcript data is permanently lost
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

svarlamov and others added 3 commits May 26, 2026 23:22
Limits transcript streaming sweep discovery to files modified within
the last 7 days by default, reducing unnecessary I/O and memory usage
for users with large transcript histories. Checkpoint-triggered
(immediate) processing is unaffected by lookback.

Configuration:
- `transcript_streaming_lookback_days` in ~/.git-ai/config.json
- `GIT_AI_TRANSCRIPT_STREAMING_LOOKBACK_DAYS` env var override
- Set to 0 to disable lookback (process all history)
- Default: 7 days

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Stores lookback_days instead of a pre-computed SystemTime cutoff so the
rolling window remains accurate over long daemon lifetimes (the cutoff
was previously fixed at construction time and would drift).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Already-tracked sessions with partial watermarks are always resumed
regardless of file age, preventing data loss on daemon restart or
delayed processing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@svarlamov svarlamov force-pushed the feat/transcript-streaming-default-lookback branch from 84927dd to ad1bb16 Compare May 26, 2026 23:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant