Skip to content

Fix v2 migration generation packing#1124

Draft
pfleidi wants to merge 12 commits intomainfrom
fix/checkpoints-v2-generation-refs-during-migration
Draft

Fix v2 migration generation packing#1124
pfleidi wants to merge 12 commits intomainfrom
fix/checkpoints-v2-generation-refs-during-migration

Conversation

@pfleidi
Copy link
Copy Markdown
Contributor

@pfleidi pfleidi commented May 6, 2026

https://entire.io/gh/entireio/cli/trails/306

What

Fix v2 checkpoint migration so under-threshold raw transcript batches stay in refs/entire/checkpoints/v2/full/current instead of creating an archived refs/entire/checkpoints/v2/full/ generation. This keeps migration behavior aligned with normal v2 generation rotation.

How

  • Reworked migration full-transcript packing to archive only full batches and leave the final partial batch in /full/current.
  • Preserved chronological packing and raw transcript readability, including reruns and force migration cleanup.
  • Added compare-and-set guards around /full/current updates so concurrent writers are not overwritten.
  • Centralized archived generation ref formatting in the checkpoint package.

Verification

  • mise run build
  • mise run lint
  • mise run test
  • mise run test:integration

Note

Medium Risk
Changes migration and generation-rotation behavior for v2 /full/* refs and adds compare-and-set ref updates; mistakes could lead to lost or duplicated raw transcript data during migration or concurrent writes.

Overview
Fixes v2 checkpoint migration so only full batches of raw transcripts are archived into numbered /full/<n> refs, while the final partial batch stays in /full/current (matching normal rotation behavior) and is rotated only if it reaches the threshold.

Refactors generation rotation into RotateCurrentGenerationIfNeeded, centralizes archived ref naming via ArchivedGenerationRefName, and adds atomic reset logic (CheckAndSetReference) to avoid overwriting /full/current when it changed concurrently.

Reworks migration packing to replace the old generationPacker with explicit batch buffering plus safer /full/current updates (including eviction of stale chunked transcript artifacts), and expands tests to cover under-threshold behavior, threshold-triggered rotation, chronological packing, and concurrent ref-update rejection.

Reviewed by Cursor Bugbot for commit ac6ed4f. Configure here.

pfleidi added 12 commits May 5, 2026 14:25
…o fix/checkpoints-v2-generation-refs-during-migration
Entire-Checkpoint: 3c53429995f2
Entire-Checkpoint: aca92a1bf611
Drop the flattenMigratedFullEntries wrapper that built a map only to copy
it into a slice that the caller iterated back into a map. FlattenTree
now writes directly into the destination map, and GenerationFileName is
removed with a single delete.

Flatten the nested if/else around GetRefState into two early-exit blocks;
GetRefState already returns ZeroHash on not-found, so no manual reset of
parentHash is needed.

Entire-Checkpoint: 4eeda0c359bc
Entire-Checkpoint: 9987534d89d5
Entire-Checkpoint: 23eeb20faa1c
Entire-Checkpoint: db62fad97d90
Replace migratedFullEntrySet.toMap with mergeInto so writeMigratedFullGeneration
and writeMigratedFinalFullCurrent share the raw-overrides-task merge rule.
Move the per-session raw artifact eviction into a free-standing
evictMigratedRawArtifacts helper so migratedFullEntrySet no longer carries
rawArtifactPaths state for a single consumer.

Also tighten the comment placement on sortMigratableCheckpoints, simplify a
trailing return, and refresh a stale doc string on
collectMissingFullCheckpointForPacking.

Entire-Checkpoint: 984a336eb487
Entire-Checkpoint: 641bcd407edc
Entire-Checkpoint: d283a23cea17
Copilot AI review requested due to automatic review settings May 6, 2026 01:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reworks v2 checkpoint migration so migrated raw transcripts use the same /full/current vs archived-generation packing behavior as normal v2 writes, while also adding some concurrency protection around /full/current ref updates.

Changes:

  • Replaced the old migration packer flow with explicit batch handling so only full batches are archived and the final partial batch is written to /full/current.
  • Added compare-and-set style handling for /full/current updates during migration and generation rotation.
  • Centralized archived generation ref naming and updated tests to cover under-threshold, rotation, force, and repair scenarios.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
cmd/entire/cli/migrate.go Reworks migration batching, /full/current writes, and ref update flow for migrated raw transcripts.
cmd/entire/cli/migrate_test.go Adds regression tests for under-threshold packing, threshold rotation, force behavior, and concurrent ref-update failures.
cmd/entire/cli/checkpoint/v2_generation.go Adds archived ref helper and exposes conditional current-generation rotation with extra concurrency checks.
cmd/entire/cli/checkpoint/v2_generation_test.go Adds tests for archived ref naming and concurrent reset protection during rotation.

Comment thread cmd/entire/cli/migrate.go
Comment on lines +291 to +292
if err := ensureEmptyV2FullCurrent(ctx, repo); err != nil {
return result, writtenRefs, fmt.Errorf("failed to pack migrated raw transcripts: %w", err)
Comment thread cmd/entire/cli/migrate.go
Comment on lines +259 to +273
if nextGeneration == 0 {
// Resolve the archive slot only when the first full batch is ready;
// force migration may prune existing archived refs earlier in the loop.
next, nextErr := v2Store.NextGenerationNumber()
if nextErr != nil {
return result, writtenRefs, fmt.Errorf("list archived v2 generations: %w", nextErr)
}
nextGeneration = next
}
refName := checkpoint.ArchivedGenerationRefName(nextGeneration)
if packErr := writeMigratedFullGeneration(ctx, repo, refName, pendingFull); packErr != nil {
return result, writtenRefs, fmt.Errorf("failed to pack migrated raw transcripts: %w", packErr)
}
writtenRefs = append(writtenRefs, refName)
nextGeneration++
Comment on lines 518 to +520
archiveRef := plumbing.NewHashReference(archiveRefName, currentRef.Hash())
if err := s.repo.Storer.SetReference(archiveRef); err != nil {
return fmt.Errorf("rotation: failed to create archived ref %s: %w", archiveRefName, err)
return "", false, fmt.Errorf("rotation: failed to create archived ref %s: %w", archiveRefName, err)
Comment thread cmd/entire/cli/migrate.go
Comment on lines +281 to +289
if len(pendingFull) > 0 {
if err := writeMigratedFinalFullCurrent(ctx, repo, v2Store, pendingFull); err != nil {
return result, writtenRefs, fmt.Errorf("failed to pack migrated raw transcripts: %w", err)
}
if refName, rotated, err := v2Store.RotateCurrentGenerationIfNeeded(ctx, batchSize); err != nil {
return result, writtenRefs, fmt.Errorf("failed to rotate migrated full/current generation: %w", err)
} else if rotated {
writtenRefs = append(writtenRefs, refName)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants