Skip to content

[Bug fix] Skip flushing empty chunk in OffHeapH3IndexCreator#18834

Open
J-HowHuang wants to merge 3 commits into
apache:masterfrom
J-HowHuang:fix-h3-offheap-empty-chunk
Open

[Bug fix] Skip flushing empty chunk in OffHeapH3IndexCreator#18834
J-HowHuang wants to merge 3 commits into
apache:masterfrom
J-HowHuang:fix-h3-offheap-empty-chunk

Conversation

@J-HowHuang

@J-HowHuang J-HowHuang commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Desciption

H3 index could run into java.lang.IndexOutOfBoundsException

java.lang.IndexOutOfBoundsException
      at java.base/java.nio.Buffer$1.apply(Buffer.java:757)
      at java.base/java.nio.Buffer$1.apply(Buffer.java:754)
      at java.base/jdk.internal.util.Preconditions$4.apply(Preconditions.java:213)
      at java.base/jdk.internal.util.Preconditions$4.apply(Preconditions.java:210)
      at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:98)
      at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:106)
      at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:302)
      at java.base/java.nio.Buffer.checkIndex(Buffer.java:779)
      at java.base/java.nio.DirectByteBuffer.getLong(DirectByteBuffer.java:861)
      at org.apache.pinot.segment.spi.memory.PinotByteBuffer.getLong(PinotByteBuffer.java:181)
      at org.apache.pinot.segment.local.segment.creator.impl.inv.geospatial.OffHeapH3IndexCreator$ChunkIt
  erator.next(OffHeapH3IndexCreator.java:193)
      at org.apache.pinot.segment.local.segment.creator.impl.inv.geospatial.OffHeapH3IndexCreator.seal(Of
  fHeapH3IndexCreator.java:138)
      at org.apache.pinot.segment.local.segment.creator.impl.ColumnIndexCreators.seal(ColumnIndexCreators
  .java:98)
      at org.apache.pinot.segment.local.segment.creator.impl.BaseSegmentCreator.flushColIndexes(BaseSegme
  ntCreator.java:818)
      at org.apache.pinot.segment.local.segment.creator.impl.BaseSegmentCreator.seal(BaseSegmentCreator.j
  ava:753)

because empty chunk could be added into the chunk list. An empty chunk would cause out of bound read in ChunkIterator

Change

  • Don't add when posting list is empty while flush
  • Check and skip empty chunk while iterating

Test

Added an unit test and verified it would fail without the patch

@codecov-commenter

codecov-commenter commented Jun 22, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 50.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.77%. Comparing base (d469cb1) to head (8bf28c1).
⚠️ Report is 30 commits behind head on master.

Files with missing lines Patch % Lines
...tor/impl/inv/geospatial/OffHeapH3IndexCreator.java 50.00% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18834      +/-   ##
============================================
- Coverage     64.78%   64.77%   -0.01%     
- Complexity     1309     1319      +10     
============================================
  Files          3381     3392      +11     
  Lines        209967   210989    +1022     
  Branches      32891    33128     +237     
============================================
+ Hits         136020   136664     +644     
- Misses        62979    63301     +322     
- Partials      10968    11024      +56     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 64.77% <50.00%> (-0.01%) ⬇️
temurin 64.77% <50.00%> (-0.01%) ⬇️
unittests 64.77% <50.00%> (-0.01%) ⬇️
unittests1 56.96% <0.00%> (+<0.01%) ⬆️
unittests2 37.20% <50.00%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Jackie-Jiang Jackie-Jiang added bug Something is not working as expected ingestion Related to data ingestion pipeline labels Jun 22, 2026

@Jackie-Jiang Jackie-Jiang left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch


// Flush the last chunk
if (_postingListMap.size() % FLUSH_THRESHOLD != 0) {
if (!_postingListMap.isEmpty() && _postingListMap.size() % FLUSH_THRESHOLD != 0) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) This is actually redundant. Actually it might be more clear to only check whether map is empty

Suggested change
if (!_postingListMap.isEmpty() && _postingListMap.size() % FLUSH_THRESHOLD != 0) {
if (!_postingListMap.isEmpty()) {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah this one is a different check. thanks for pointing out

if (_postingListMap.size() % FLUSH_THRESHOLD == 0) {
// Skip flushing an empty posting list map. This could happen when add() didn't add any entry due to invalid
// Geometry and _continueOnError is set
if (!_postingListMap.isEmpty() && _postingListMap.size() % FLUSH_THRESHOLD == 0) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we simply do:

Suggested change
if (!_postingListMap.isEmpty() && _postingListMap.size() % FLUSH_THRESHOLD == 0) {
if (_postingListMap.size() == FLUSH_THRESHOLD) {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, add() always only adds one entry at a time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something is not working as expected ingestion Related to data ingestion pipeline

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants