Skip to content

[core] Fix BSI reader predicate pruning for Long.MIN_VALUE boundary#8150

Open
lxy-9602 wants to merge 2 commits into
apache:masterfrom
lxy-9602:fix-bsi-index
Open

[core] Fix BSI reader predicate pruning for Long.MIN_VALUE boundary#8150
lxy-9602 wants to merge 2 commits into
apache:masterfrom
lxy-9602:fix-bsi-index

Conversation

@lxy-9602
Copy link
Copy Markdown
Contributor

@lxy-9602 lxy-9602 commented Jun 7, 2026

Purpose

Fix incorrect predicate pruning in BitSliceIndexBitmapFileIndex.Reader when the query literal is Long.MIN_VALUE.

Problem

The BSI reader uses Math.abs(value) to convert negative literals before querying the negative BSI. However, Math.abs(Long.MIN_VALUE) overflows and returns Long.MIN_VALUE itself (still negative) due to Java's two-complement arithmetic. This causes incorrect results for all comparison predicates:

Predicate Expected Actual (before fix)
x > Long.MIN_VALUE all non-null rows only positive rows (negative rows dropped)
x >= Long.MIN_VALUE all non-null rows only positive rows (negative rows dropped)
x < Long.MIN_VALUE empty negative rows incorrectly kept
x <= Long.MIN_VALUE empty negative rows incorrectly kept
x == Long.MIN_VALUE empty incorrect result
x != Long.MIN_VALUE all non-null rows incorrect result

Root Cause

Math.abs(Long.MIN_VALUE) == Long.MIN_VALUE — the only long value for which Math.abs overflows.

Fix

Add an early check for value == Long.MIN_VALUE in visitIn, visitNotIn, visitLessThan, visitLessOrEqual, visitGreaterThan, and visitGreaterOrEqual. Since the BSI writer cannot store Long.MIN_VALUE (the Appender rejects negative min), the index can never contain this value. Therefore:

  • GT / GTE with Long.MIN_VALUE → return all non-null rows
  • LT / LTE with Long.MIN_VALUE → return empty
  • EQ / IN with Long.MIN_VALUE → return empty
  • NEQ / NOT IN with Long.MIN_VALUE → return all non-null rows

Note: The writer-side issue (Math.abs(Long.MIN_VALUE) in StatsCollectList.collect() and serializedBytes()) is a known limitation — the writer rejects Long.MIN_VALUE with an IllegalArgumentException. A separate writer test is added to document this behavior.

Testing

  • testReaderPredicatePruningWithLongMinValue — verifies all 6 predicates (>, >=, <, <=, ==, !=) with Long.MIN_VALUE as the query literal against data containing negative values.
  • testWriterCannotHandleLongMinValue — documents that the writer throws IllegalArgumentException("values should be non-negative") when attempting to write Long.MIN_VALUE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant