Skip to content

[ISSUE #10417][BUG] Fix stale minOffset after truncate#10418

Open
chenxu80 wants to merge 2 commits into
apache:developfrom
chenxu80:bugfix/fix-stale-minoffset-after-truncate
Open

[ISSUE #10417][BUG] Fix stale minOffset after truncate#10418
chenxu80 wants to merge 2 commits into
apache:developfrom
chenxu80:bugfix/fix-stale-minoffset-after-truncate

Conversation

@chenxu80
Copy link
Copy Markdown

@chenxu80 chenxu80 commented Jun 2, 2026

Which Issue(s) This PR Fixes

Summary

  • fix a case where broker-side commitlog truncation removes all ConsumeQueue files for a topic queue but leaves a stale
    minOffset
  • reset the queue's logical minimum offset when the queue becomes empty, so minOffset stays consistent with maxOffset
  • avoid exposing an invalid state where maxOffset == 0 but minOffset > 0

Root cause

When all ConsumeQueue files of a topic queue are deleted during commitlog truncation, the in-memory logical minimum offset may
not be reset. As a result, later minOffset queries can return a stale value even though the queue is already empty.

Test Plan

  • reproduce the scenario where commitlog truncation removes all ConsumeQueue files for a topic queue
  • verify the queue reports minOffset == 0 and maxOffset == 0 after truncation
  • run the related store tests

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 12.50000% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 47.90%. Comparing base (a6fb9e2) to head (4d6d319).
⚠️ Report is 2 commits behind head on develop.

Files with missing lines Patch % Lines
...n/java/org/apache/rocketmq/store/ConsumeQueue.java 12.50% 5 Missing and 2 partials ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             develop   #10418      +/-   ##
=============================================
- Coverage      48.03%   47.90%   -0.13%     
+ Complexity     13286    13245      -41     
=============================================
  Files           1376     1376              
  Lines         100558   100562       +4     
  Branches       12985    12986       +1     
=============================================
- Hits           48305    48179     -126     
- Misses         46333    46435     +102     
- Partials        5920     5948      +28     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@lollipopjin
Copy link
Copy Markdown
Contributor

Since this bug can surface during HA master-slave switch or CommitLog truncation, would you consider adding a test that deletes all ConsumeQueue files and then verifies correctMinOffset properly resets minLogicOffset and maxPhysicOffset? That would help guard against future regressions in this code path.

Copy link
Copy Markdown

@oss-taishan-ai oss-taishan-ai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by github-manager-bot

Summary

Fixes a bug where ConsumeQueue#correctMinOffset fails to reset minLogicOffset when commitlog truncation deletes all ConsumeQueue files for a topic queue, leaving an inconsistent state where minOffset > maxOffset.

Findings

  • [Correctness] ✅ The fix is well-structured and addresses the root cause:

    1. Reordering checksgetLastMappedFile() == null is now checked first. This is the critical case where all CQ files have been deleted, and the queue must be reset to empty state (minLogicOffset = 0, maxPhysicOffset = -1). Previously this was checked second, and the first >= check would short-circuit and return without any reset.

    2. >=> change — When minLogicOffset == maxOffset, the queue is legitimately empty (min points to the end, which equals the current max). This is not an anomalous state and does not require a reset. The > guard correctly targets only the truly invalid case where minLogicOffset exceeds maxOffset.

    3. Reset valuesminLogicOffset = 0 matches the default initialization (line 78). maxPhysicOffset = -1 matches the default initialization (line 73). This is consistent with reset() (line 1087).

  • [Performance] ✅ No performance concern. The getLastMappedFile() call was already present in the original code; it's simply moved earlier in the method.

  • [Compatibility] ✅ No API or protocol changes. Internal state management only.

  • [Tests] ⚠️ No unit test added. The scenario (commitlog truncation removing all CQ files) is complex to reproduce in a unit test, but the fix is small enough that a targeted test for correctMinOffset with a mocked/empty mappedFileQueue would be valuable to prevent regression. Consider adding:

    @Test
    public void testCorrectMinOffsetWhenAllFilesDeleted() {
        // Setup: create a ConsumeQueue, write some entries, then delete all mapped files
        // After calling correctMinOffset:
        //   assertEquals(0, consumeQueue.getMinOffsetInQueue() * CQ_STORE_UNIT_SIZE);
        //   assertEquals(-1, consumeQueue.getMaxPhysicOffset());
    }

Suggestions

  1. Add a unit test — As noted above, a test covering the "all CQ files deleted" path would strengthen this fix.

  2. Consider thread safetyminLogicOffset is volatile (good), but maxPhysicOffset is not. In the reset path, both are written sequentially. If another thread reads minLogicOffset after it's reset to 0 but before maxPhysicOffset is reset to -1, the intermediate state could be briefly inconsistent. This is likely benign (the queue is empty, no consumer should be iterating), but worth a quick look to confirm no reader depends on both values being consistent simultaneously.

CI

✅ All 10 checks pass (CodeQL, bazel-compile, maven-compile × 4, coverage, license, misspell).


Automated review by github-manager-bot

@chenxu80
Copy link
Copy Markdown
Author

chenxu80 commented Jun 5, 2026

@lollipopjin Thanks for the suggestion. I have added a regression test in ConsumeQueueTest for the case where all ConsumeQueue files are deleted, to verify that correctMinOffset properly resets minLogicOffset to 0 and maxPhysicOffset to -1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Stale minOffset after truncating all ConsumeQueue files

4 participants