Skip to content

[ISSUE #10423] Optimize LiteConsumerLagCalculator.getLagCountTopK by deferring getStoreTimestamp to topK results only#10424

Open
f1amingo wants to merge 1 commit into
apache:developfrom
f1amingo:lite_consumer_lag
Open

[ISSUE #10423] Optimize LiteConsumerLagCalculator.getLagCountTopK by deferring getStoreTimestamp to topK results only#10424
f1amingo wants to merge 1 commit into
apache:developfrom
f1amingo:lite_consumer_lag

Conversation

@f1amingo
Copy link
Copy Markdown
Contributor

@f1amingo f1amingo commented Jun 3, 2026

What is the purpose of the change

LiteConsumerLagCalculator.getLagCountTopK currently calls getStoreTimestamp() (disk I/O) for every offset entry, creates a LiteLagInfo for each, and uses String.split() for key parsing. This PR optimizes these hotspots.

Brief changelog

  • Defer getStoreTimestamp to topK results only: accumulate consumerOffset in LiteLagInfo during traversal, compute timestamps for the final topK entries after heap filtering (N → K calls).
  • Lazy object creation: only instantiate LiteLagInfo when the entry qualifies for heap insertion.
  • Replace split() with indexOf(): parse topicAtGroup key via indexOf + substring, avoiding array allocation; check lite prefix before string extraction.
  • Replace AtomicLong with long[]: single-threaded traversal needs no atomic overhead.

Verifying this change

Existing unit tests cover the changed methods.

@f1amingo f1amingo force-pushed the lite_consumer_lag branch 2 times, most recently from 061b5f2 to 634bca0 Compare June 4, 2026 02:24
Copy link
Copy Markdown
Contributor

@imzs imzs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a necessary optimization

@f1amingo f1amingo force-pushed the lite_consumer_lag branch 2 times, most recently from 43a1259 to 21e215a Compare June 4, 2026 02:58
…pK by deferring getStoreTimestamp to topK results only
@f1amingo f1amingo force-pushed the lite_consumer_lag branch from 21e215a to d46924f Compare June 4, 2026 03:06
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 51.06383% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 47.94%. Comparing base (2a9560c) to head (d46924f).

Files with missing lines Patch % Lines
...tools/command/lite/GetLiteGroupInfoSubCommand.java 0.00% 11 Missing ⚠️
...etmq/broker/metrics/LiteConsumerLagCalculator.java 75.00% 3 Missing and 5 partials ⚠️
...a/org/apache/rocketmq/common/lite/LiteLagInfo.java 0.00% 4 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             develop   #10424      +/-   ##
=============================================
- Coverage      48.05%   47.94%   -0.11%     
+ Complexity     13303    13276      -27     
=============================================
  Files           1377     1377              
  Lines         100611   100629      +18     
  Branches       12991    12994       +3     
=============================================
- Hits           48347    48246     -101     
- Misses         46348    46437      +89     
- Partials        5916     5946      +30     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@oss-taishan-ai oss-taishan-ai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by github-manager-bot

Summary

Optimizes LiteConsumerLagCalculator.getLagCountTopK by deferring expensive getStoreTimestamp calls to topK results only, lazy object creation, replacing split() with indexOf(), and replacing AtomicLong with long[].

Findings

  • [Correctness] ✅ Core optimizations are correct:

    • AtomicLonglong[]: Correct since offsetTableForEachByGroup is single-threaded
    • split()indexOf() + substring(): Equivalent behavior, avoids array allocation
    • isLiteTopicQueue(topicAtGroup) instead of isLiteTopicQueue(topicGroup[0]): Correct because isLiteTopicQueue only checks startsWith prefix
    • Lazy LiteLagInfo creation: Only instantiates when qualifying for heap insertion
  • [Correctness] ⚠️ Post-heap lmqName reconstruction (lines 240-247):

    String parentTopic = LiteMetadataUtil.getLiteBindTopic(group, brokerController);
    for (LiteLagInfo lagInfo : topList) {
        String lmqName = LiteUtil.toLmqName(parentTopic, lagInfo.getLiteTopic());
        lagInfo.setEarliestUnconsumedTimestamp(getStoreTimestamp(lmqName, consumerOffset));
    }

    This reconstructs lmqName from group config instead of using the original lmqName from the offset table. This works in practice because the only caller (LiteManagerProcessor.getLiteGroupInfo) validates the group and liteBindTopic before calling getLagCountTopK. However, the method signature allows group to be null, and if getLiteBindTopic returns null, toLmqName returns null, causing getStoreTimestamp(null, ...) to fail. Consider adding a null guard:

    if (parentTopic == null) {
        // Fall back to original behavior or skip timestamp lookup
        return Pair.of(topList, totalLagCount[0]);
    }
  • [Readability] ⚠️ Operator precedence (line 222):

    if (minHeap.size() < topK || minHeap.peek() != null && diff > minHeap.peek().getLagCount())

    Java evaluates && before ||, so this is parsed as size < topK || (peek != null && diff > peek.getLagCount()), which is correct. Adding explicit parentheses would improve readability and prevent future maintenance errors:

    if (minHeap.size() < topK || (minHeap.peek() != null && diff > minHeap.peek().getLagCount()))
  • [Performance] ✅ Excellent improvements:

    • N → K getStoreTimestamp calls (major I/O reduction for groups with thousands of lite topics)
    • N → K LiteLagInfo allocations
    • No String.split() array allocations per entry
    • No AtomicLong CAS overhead
  • [Tests] ✅ Test testGetLagCountTopK_NormalCase covers the new behavior including timestamp deferral and mocks SubscriptionGroupManager for liteBindTopic lookup.

  • [CI] ✅ All 10 checks pass (CodeQL, bazel-compile, maven-compile × 4, coverage, license, misspell).

Suggestions

  1. Add null guard for parentTopic in the post-heap loop to handle edge cases defensively
  2. Add explicit parentheses in the heap condition for readability
  3. Consider adding a comment explaining why lmqName reconstruction is safe (i.e., caller guarantees valid group with liteBindTopic)

Automated review by github-manager-bot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Enhancement] Optimize LiteConsumerLagCalculator.getLagCountTopK by deferring expensive getStoreTimestamp to topK results only

4 participants