Skip to content

Fix NPE in GapfillProcessor when TIMESERIESON column is not in SELECT list#18820

Merged
Jackie-Jiang merged 4 commits into
apache:masterfrom
Akanksha-kedia:fix/gapfill-timeserieson-npe-9683
Jun 24, 2026
Merged

Fix NPE in GapfillProcessor when TIMESERIESON column is not in SELECT list#18820
Jackie-Jiang merged 4 commits into
apache:masterfrom
Akanksha-kedia:fix/gapfill-timeserieson-npe-9683

Conversation

@Akanksha-kedia

Copy link
Copy Markdown
Contributor

Summary

Fixes #9683

When a column appears in TIMESERIESON but is omitted from the SELECT list, indexes.get(entityColum.getIdentifier()) returned null and auto-unboxing to int threw a silent NullPointerException at GapfillProcessor.process.

Fix: Replace the bare unbox with an explicit null check. If a TIMESERIESON column is missing from the result schema, throw BadQueryRequestException with a descriptive message so callers get an actionable error instead of a stack trace.

Changes

  • GapfillProcessor.java: null-check indexes.get() result and throw BadQueryRequestException when a TIMESERIESON column is absent from the SELECT list
  • GapfillQueriesTest.java: add regression test testGapfillThrowsOnTimeseriesOnColumnNotInSelect verifying the exception is raised

Test plan

  • GapfillQueriesTest#testGapfillThrowsOnTimeseriesOnColumnNotInSelect — new test that asserts BadQueryRequestException is thrown for the failing query
  • All existing GapfillQueriesTest cases continue to pass

🤖 Generated with Claude Code

Akanksha-kedia and others added 2 commits June 19, 2026 09:54
Implements the SQL standard TRANSLATE(string, from, to) function, which
replaces each character in 'from' with the corresponding character in
'to'. Characters beyond the length of 'to' are deleted from the output.
This matches the behavior in PostgreSQL, Oracle, and Trino.

  translate('hello', 'aeiou', 'AEIOU')  → 'hEllO'
  translate('abc',   'abc',   'xy')     → 'xy'    (c deleted)
  translate('abc',   'abc',   '')       → ''      (all deleted)

Adds unit tests covering basic replacement, deletion, no-op (no match),
empty inputs, and duplicate characters in 'from'.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… list

When a column appears in TIMESERIESON but is omitted from the SELECT
list, indexes.get() returned null and auto-unboxing to int threw a
silent NullPointerException. Replace the bare unboxing with an explicit
null check and throw BadQueryRequestException with a descriptive message
so callers get an actionable error instead of a stack trace.

Add a regression test in GapfillQueriesTest that asserts the query is
rejected with BadQueryRequestException.

Fixes apache#9683

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Akanksha-kedia

Copy link
Copy Markdown
Contributor Author

Hey @xiangfu0, would appreciate a review on this when you get a chance!

@codecov-commenter

codecov-commenter commented Jun 22, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 69.23077% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.78%. Comparing base (5526d5b) to head (3dfd809).
⚠️ Report is 23 commits behind head on master.

Files with missing lines Patch % Lines
.../java/org/apache/pinot/core/util/GapfillUtils.java 70.83% 3 Missing and 11 partials ⚠️
...ache/pinot/core/query/reduce/GapfillProcessor.java 50.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18820      +/-   ##
============================================
- Coverage     64.82%   64.78%   -0.05%     
- Complexity     1319     1322       +3     
============================================
  Files          3388     3393       +5     
  Lines        210228   211073     +845     
  Branches      32948    33158     +210     
============================================
+ Hits         136282   136740     +458     
- Misses        62978    63308     +330     
- Partials      10968    11025      +57     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 64.78% <69.23%> (-0.05%) ⬇️
temurin 64.78% <69.23%> (-0.05%) ⬇️
unittests 64.78% <69.23%> (-0.05%) ⬇️
unittests1 56.99% <69.23%> (-0.02%) ⬇️
unittests2 37.17% <0.00%> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Jackie-Jiang Jackie-Jiang left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution

- Revert accidental extra blank lines in StringFunctions.java and StringFunctionsTest.java
- Add GapfillUtils.validateGapfillColumns() to check TIMESERIESON columns against
  SELECT list at query-parse time (before server round-trip)
- Call validateGapfillColumns() in BaseSingleStageBrokerRequestHandler during request
  validation, so invalid queries are rejected before sending to servers

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Akanksha-kedia

Copy link
Copy Markdown
Contributor Author

@Jackie-Jiang

@Akanksha-kedia Akanksha-kedia left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review, @Jackie-Jiang!

  1. Reverted the accidental extra blank lines in StringFunctions.java and StringFunctionsTest.java.

  2. For the early validation: added GapfillUtils.validateGapfillColumns() that checks each TIMESERIESON column against the SELECT list using the QueryContext. Called it from BaseSingleStageBrokerRequestHandler inside the existing validateRequest try-catch block, so the error is returned before any server requests are sent.

// For gapfill queries, validate that all TIMESERIESON columns are present in the SELECT list
// before incurring a server round-trip.
if (pinotQuery != serverPinotQuery) {
QueryContext outerQueryContext = QueryContextConverterUtils.getQueryContext(pinotQuery);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This extra query compilation could be expensive. To avoid the overhead, is it possible to combine the validation into GapfillUtils.stripGapfill()?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Moved the validation into GapfillUtils.stripGapfill(PinotQuery). It now works directly on the Thrift PinotQuery/Expression objects already in scope — no extra getQueryContext() compilation needed. The QueryContext-based validateGapfillColumns method has been removed.

Per Jackie Jiang's review feedback on PR apache#18820: the previous approach
called QueryContextConverterUtils.getQueryContext() inside the broker
request handler solely to validate TIMESERIESON columns, incurring an
extra query compilation on every gapfill request.

The validation is now performed inside stripGapfill(PinotQuery) using
the Thrift PinotQuery/Expression objects already in scope, with no
additional compilation cost. The QueryContext-based validateGapfillColumns
method is removed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Jackie-Jiang Jackie-Jiang added bug Something is not working as expected query Related to query processing labels Jun 24, 2026
@Jackie-Jiang Jackie-Jiang merged commit ca608fa into apache:master Jun 24, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something is not working as expected query Related to query processing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Gapfill] NPE when a column is included in TIMESERIESON but not selected

3 participants