Skip to content

HBASE-30150: Delegate getHintForRejectedRow and getSkipHint in composite filters.#8217

Open
shubham-roy wants to merge 3 commits into
apache:masterfrom
shubham-roy:master-HBASE-30150
Open

HBASE-30150: Delegate getHintForRejectedRow and getSkipHint in composite filters.#8217
shubham-roy wants to merge 3 commits into
apache:masterfrom
shubham-roy:master-HBASE-30150

Conversation

@shubham-roy
Copy link
Copy Markdown
Contributor

@shubham-roy shubham-roy commented May 11, 2026

HBASE-30150: Delegate getHintForRejectedRow and getSkipHint in composite filters

Problem

HBASE-29974 introduced two new filter optimization methods — getHintForRejectedRow and getSkipHint — that allow filters to provide seek hints to the
scan pipeline. However, composite filter wrappers (FilterList, SkipFilter, WhileMatchFilter) did not delegate these methods to their sub-filters.
Hints from sub-filters were silently discarded, severely limiting the optimization's practical impact.

Solution

Implement delegation for both methods in all composite filter wrappers:

FilterList (AND / MUST_PASS_ALL):

  • getHintForRejectedRow: Maximal step — returns the farthest hint among sub-filters that actually rejected the row via filterRowKey. Only
    sub-filters whose filterRowKey returned true are consulted, honouring the per-filter contract. Null hints are ignored.
  • getSkipHint: Maximal step — returns the farthest hint among all sub-filters. Null hints are ignored.

FilterList (OR / MUST_PASS_ONE):

  • getHintForRejectedRow: Minimal step — returns the nearest hint. If any non-terminated sub-filter returns null, the composite returns null (cannot
    safely skip).
  • getSkipHint: Same minimal-step semantic with null-collapse.

Note on OR composites: Since most HBase filters today do not override these methods (they inherit the return null default from FilterBase), any
OR composite containing such a filter will collapse to null. The OR delegation is semantically correct and necessary for correctness when sub-filters do
override these methods, but in practice the optimization rarely fires for OR composites in current workloads.

SkipFilter / WhileMatchFilter: Simple passthrough delegation to the wrapped filter.

Leaf filter getSkipHint implementations:

  • ColumnPrefixFilter: Delegates to getNextCellHint with null-prefix guard.
  • ColumnRangeFilter: Delegates to getNextCellHint with null-minColumn guard.
  • MultipleColumnPrefixFilter: Recomputes the correct target prefix from sortedPrefixes statelessly (cannot reuse the mutable hint field since
    filterCell was never called on the structural-skip path).

Scan direction handling

All composite merging uses FilterListBase.compareCell() which negates the comparison when reversed == true. This means "max" in AND correctly becomes
the smallest row key in reverse scan, and "min" in OR becomes the largest. Unit and integration tests explicitly verify reversed-scan behavior for both
AND and OR, for both APIs.

Contract compliance (FilterListWithAND)

rejectedByFilterRowKeyarray (mirroring the existingseekHintFilterspattern used bygetNextCellHint). Only those sub-filters are consulted for hints, honouring the Filter.java` contract: "Only called after filterRowKey(Cell) has returned true for the same firstRowCell."

This does not apply to getSkipHint, whose contract explicitly permits being called on cells the filter never saw.

Test coverage

Unit tests (TestFilterListHintDelegation) — 36 tests:

  • AND/OR × getHintForRejectedRow/getSkipHint × forward/reversed/null/all-null/single/empty/terminated
  • Contract enforcement: non-rejecting sub-filter throws if incorrectly consulted
  • Divergent hints: asserts AND returns max when sub-filters hint to different targets
  • SkipFilter and WhileMatchFilter delegation
  • Nested FilterList composition
  • All-sub-filters-terminated edge case for both AND and OR

Integration tests (TestFilterHintForRejectedRow) — 26 tests:

  • AND/OR hint delegation end-to-end with real HRegion scans
  • AND/OR with one null-hint sub-filter
  • Nested FilterList, WhileMatchFilter
  • AND/OR reversed scan
  • Divergent hints (filterA → row-03, filterB → row-07, asserts max)
  • ColumnRangeFilter, ColumnPrefixFilter, MultipleColumnPrefixFilter getSkipHint with time-range gating
  • AND getSkipHint composition

Files changed

File Change
Filter.java Javadoc: "limitation" → "support"; OR null-collapse semantic documented
FilterList.java Delegates both methods to filterListBase
FilterListWithAND.java Maximal-step merging + rejectedByFilterRowKey tracking
FilterListWithOR.java Minimal-step merging with null-collapse
SkipFilter.java Passthrough delegation
WhileMatchFilter.java Passthrough delegation
ColumnPrefixFilter.java getSkipHint with null-prefix guard
ColumnRangeFilter.java getSkipHint with null-minColumn guard
MultipleColumnPrefixFilter.java Stateless getSkipHint recomputing target from sortedPrefixes
TestFilterListHintDelegation.java 36 unit tests
TestFilterHintForRejectedRow.java 26 integration tests (14 pre-existing from HBASE-29974 + 12 new)

Shubham Roy and others added 3 commits May 11, 2026 10:46
…ite filters

FilterList (AND/OR), SkipFilter, and WhileMatchFilter now delegate
getHintForRejectedRow() and getSkipHint() to their sub-filters, using
maximal-step merging for AND and minimal-step merging for OR — consistent
with the existing getNextCellHint() convention. ColumnRangeFilter and
ColumnPrefixFilter gain stateless getSkipHint() implementations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nd strengthen test coverage

- Add getSkipHint() to MultipleColumnPrefixFilter so it participates in
  the structural-skip hint optimization alongside ColumnPrefixFilter and
  ColumnRangeFilter.
- Add unit tests for the all-sub-filters-terminated edge case in both
  AND and OR FilterList for getHintForRejectedRow and getSkipHint.
- Add integration test for MultipleColumnPrefixFilter.getSkipHint with
  time-range gating.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…HintForRejectedRow

- Track which sub-filters actually rejected via filterRowKey() using a
  boolean[] rejectedByFilterRowKey array (mirrors seekHintFilters pattern).
  getHintForRejectedRow now only consults sub-filters that individually
  returned true from filterRowKey, honouring the per-filter contract.
- Clarify Filter.java javadoc: OR's null-collapse semantic is now
  explicitly documented for both getHintForRejectedRow and getSkipHint.
- Add unit test proving the contract: a non-rejecting sub-filter that
  throws IllegalStateException from getHintForRejectedRow is never called.
- Add divergent-hint tests (unit + integration) asserting AND correctly
  returns max when sub-filters hint to different targets.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant