Skip to content

fix(tui): retry composer history atomic writes#2277

Open
Hmbown wants to merge 1 commit into
mainfrom
fix/composer-history-windows-retry
Open

fix(tui): retry composer history atomic writes#2277
Hmbown wants to merge 1 commit into
mainfrom
fix/composer-history-windows-retry

Conversation

@Hmbown
Copy link
Copy Markdown
Owner

@Hmbown Hmbown commented May 27, 2026

Summary

Refs #2274.

Verification

  • CARGO_TARGET_DIR=/Volumes/VIXinSSD/whalebro/codewhale/target cargo test -p codewhale-tui append_history_dispatched_does_not_block_the_caller -- --nocapture
  • CARGO_TARGET_DIR=/Volumes/VIXinSSD/whalebro/codewhale/target cargo check -p codewhale-tui --all-features --locked
  • cargo fmt --all -- --check
  • git diff --check

Copilot AI review requested due to automatic review settings May 27, 2026 11:31
Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmbown has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a retry mechanism with backoff delays (write_history_atomic) when persisting composer history to disk. The reviewer suggested simplifying the retry loop to make it more idiomatic and readable by removing the complex iterator chain and unreachable! macro, handling the final write attempt outside the loop instead.

Comment on lines +212 to +232
for (attempt, delay) in RETRY_DELAYS
.iter()
.map(Some)
.chain(std::iter::once(None))
.enumerate()
{
match crate::utils::write_atomic(path, payload) {
Ok(()) => return Ok(()),
Err(err) if delay.is_some() => {
tracing::debug!(
"Retrying composer history write to {} after attempt {} failed: {err}",
path.display(),
attempt + 1
);
std::thread::sleep(*delay.expect("delay checked"));
}
Err(err) => return Err(err),
}
}

unreachable!("retry iterator always ends with a final write attempt")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current retry loop uses a complex iterator chain (.map(Some).chain(std::iter::once(None))) and an unreachable! macro to handle the final write attempt. We can simplify this significantly and make it more idiomatic by looping over the retry delays and performing the final write attempt outside the loop. This avoids the need for option wrapping, unwrapping, and the unreachable! assertion.

    for (attempt, &delay) in RETRY_DELAYS.iter().enumerate() {
        match crate::utils::write_atomic(path, payload) {
            Ok(()) => return Ok(()),
            Err(err) => {
                tracing::debug!(
                    "Retrying composer history write to {} after attempt {} failed: {err}",
                    path.display(),
                    attempt + 1
                );
                std::thread::sleep(delay);
            }
        }
    }

    crate::utils::write_atomic(path, payload)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@Hmbown
Copy link
Copy Markdown
Owner Author

Hmbown commented May 27, 2026

Independent review:

  • Failure mode silently loses entries. When the 8 attempts exhaust, append_history_entries_to only logs a tracing::warn! and drops the entire entries Vec (line 193-198). The trimmed/deduped new entries never reach disk and are not requeued onto the writer channel. Worth either (a) re-sending the failed payload back through channel() for the next batch cycle, or (b) at minimum emitting a user-visible notice — losing prompt history silently is the worst possible outcome of a retry that exists to prevent loss.
  • write_atomic doesn't fsync the parent dir. utils.rs:174 does sync_all() on the temp file and persist() (rename), but no File::open(parent).sync_all(). On a crash between rename and dir-sync, ext4/xfs can lose the rename. Out of scope for this PR but worth a follow-up since the doc comment promises crash durability.
  • PR description mentions "readers polling" but only the test polls (line 353, 25ms loop). The real binary calls load_history() exactly once at startup (app.rs:1797). On Windows the production sharing-violation source is more likely AV/indexers than the app itself — the retry is still correct, but the rationale in the PR body is misleading.
  • No regression test for the retry path — only the existing throughput test. A test that injects a transient failure (e.g., holding an open handle for ~50ms) and asserts the write eventually succeeds would lock in this fix.
  • No jitter in the backoff. Fine for a single writer thread, but if two CodeWhale processes share a HOME they'd lockstep.
  • v0.8.48 (PR refactor: consolidate workspace crates — 14→11 (delete tui-core, merge hooks+agent) #2256) does not touch composer_history.rs — clean merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants