Skip to content

release-25.4: crosscluster/physical: fix ingest retry progress bug#168337

Merged
trunk-io[bot] merged 2 commits intocockroachdb:release-25.4from
msbutler:backport25.4-168180
Apr 14, 2026
Merged

release-25.4: crosscluster/physical: fix ingest retry progress bug#168337
trunk-io[bot] merged 2 commits intocockroachdb:release-25.4from
msbutler:backport25.4-168180

Conversation

@msbutler
Copy link
Copy Markdown
Collaborator

@msbutler msbutler commented Apr 14, 2026

crosscluster/physical: fix ingest retry progress bug

This patch fixes two bugs in ingestWithRetries.

  1. the resumer's job never has its in memory progress updated within a
    single resume, as the frontier processor writes progress updates
    directly to the db via job id.
  2. previousPersistedSpans is never updated, which, in combination with
    the above, means, unless the job is resumed more than once, the
    branch to reset the retrier is never taken, as they are both always
    zero-values.

The fix is to refresh the resumer's job and update the previous value
each loop iteration.

Fixes: #167384

Release note: None

Co-Authored-By: roachdev-claude roachdev-claude-bot@cockroachlabs.com

Release Justification: low risk bug fix

msbutler and others added 2 commits April 14, 2026 11:41
This commit adds a new helper method LoadClaimedJobWithTxn to the job
registry. This method is like LoadClaimedJob but accepts a transaction
parameter, allowing callers to reload a claimed job's state within an
existing transaction context.

This is useful when a job resumer needs to refresh its in-memory view
of a job's progress using a specific transaction, while maintaining
the proper claimed job semantics (session tracking, registry state).

Release note: None

Epic: None

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
This patch fixes two bugs in `ingestWithRetries`.
1. the resumer's job never has its in memory progress updated within a
   single resume, as the frontier processor writes progress updates
   directly to the db via job id.
2. `previousPersistedSpans` is never updated, which, in combination with
   the above, means, unless the job is resumed more than once, the
   branch to reset the retrier is never taken, as they are both always
   zero-values.

The fix is to refresh the resumer's job and update the previous value
each loop iteration.

Fixes: cockroachdb#167384

Release note: None

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
@msbutler msbutler self-assigned this Apr 14, 2026
@trunk-io
Copy link
Copy Markdown
Contributor

trunk-io bot commented Apr 14, 2026

😎 Merged successfully - details.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Apr 14, 2026

Thanks for opening a backport.

Before merging, please confirm that it falls into one of the following categories (select one):

  • Non-production code changes OR fixes for serious issues. Non-production includes test-only changes, build system changes, etc. Serious issues are defined in the policy as correctness, stability, or security issues, data corruption/loss, significant performance regressions, breaking working and widely used functionality, or an inability to detect and debug production issues.
  • Other approved changes. These changes must be gated behind a disabled-by-default feature flag unless there is a strong justification not to. Reference the approved ENGREQ ticket in the PR body (e.g., "Fixes ENGREQ-123").

Add a brief release justification to the PR description explaining your selection.

Also, confirm that the change does not break backward compatibility and complies with all aspects of the backport policy.

All backports must be reviewed by the TL and EM for the owning area.

@blathers-crl blathers-crl bot added backport Label PR's that are backports to older release branches T-disaster-recovery labels Apr 14, 2026
@msbutler msbutler marked this pull request as ready for review April 14, 2026 17:15
@msbutler msbutler requested review from a team as code owners April 14, 2026 17:15
@msbutler msbutler requested review from dt and removed request for a team April 14, 2026 17:15
@trunk-io trunk-io bot merged commit cad95a0 into cockroachdb:release-25.4 Apr 14, 2026
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport Label PR's that are backports to older release branches T-disaster-recovery target-release-25.4.10

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants