Fix NPE in GapfillProcessor when TIMESERIESON column is not in SELECT list#18820
Conversation
Implements the SQL standard TRANSLATE(string, from, to) function, which
replaces each character in 'from' with the corresponding character in
'to'. Characters beyond the length of 'to' are deleted from the output.
This matches the behavior in PostgreSQL, Oracle, and Trino.
translate('hello', 'aeiou', 'AEIOU') → 'hEllO'
translate('abc', 'abc', 'xy') → 'xy' (c deleted)
translate('abc', 'abc', '') → '' (all deleted)
Adds unit tests covering basic replacement, deletion, no-op (no match),
empty inputs, and duplicate characters in 'from'.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… list When a column appears in TIMESERIESON but is omitted from the SELECT list, indexes.get() returned null and auto-unboxing to int threw a silent NullPointerException. Replace the bare unboxing with an explicit null check and throw BadQueryRequestException with a descriptive message so callers get an actionable error instead of a stack trace. Add a regression test in GapfillQueriesTest that asserts the query is rejected with BadQueryRequestException. Fixes apache#9683 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Hey @xiangfu0, would appreciate a review on this when you get a chance! |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #18820 +/- ##
============================================
- Coverage 64.82% 64.78% -0.05%
- Complexity 1319 1322 +3
============================================
Files 3388 3393 +5
Lines 210228 211073 +845
Branches 32948 33158 +210
============================================
+ Hits 136282 136740 +458
- Misses 62978 63308 +330
- Partials 10968 11025 +57
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Thanks for the contribution
- Revert accidental extra blank lines in StringFunctions.java and StringFunctionsTest.java - Add GapfillUtils.validateGapfillColumns() to check TIMESERIESON columns against SELECT list at query-parse time (before server round-trip) - Call validateGapfillColumns() in BaseSingleStageBrokerRequestHandler during request validation, so invalid queries are rejected before sending to servers Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Akanksha-kedia
left a comment
There was a problem hiding this comment.
Thanks for the review, @Jackie-Jiang!
-
Reverted the accidental extra blank lines in
StringFunctions.javaandStringFunctionsTest.java. -
For the early validation: added
GapfillUtils.validateGapfillColumns()that checks each TIMESERIESON column against the SELECT list using theQueryContext. Called it fromBaseSingleStageBrokerRequestHandlerinside the existingvalidateRequesttry-catch block, so the error is returned before any server requests are sent.
| // For gapfill queries, validate that all TIMESERIESON columns are present in the SELECT list | ||
| // before incurring a server round-trip. | ||
| if (pinotQuery != serverPinotQuery) { | ||
| QueryContext outerQueryContext = QueryContextConverterUtils.getQueryContext(pinotQuery); |
There was a problem hiding this comment.
This extra query compilation could be expensive. To avoid the overhead, is it possible to combine the validation into GapfillUtils.stripGapfill()?
There was a problem hiding this comment.
Done! Moved the validation into GapfillUtils.stripGapfill(PinotQuery). It now works directly on the Thrift PinotQuery/Expression objects already in scope — no extra getQueryContext() compilation needed. The QueryContext-based validateGapfillColumns method has been removed.
Per Jackie Jiang's review feedback on PR apache#18820: the previous approach called QueryContextConverterUtils.getQueryContext() inside the broker request handler solely to validate TIMESERIESON columns, incurring an extra query compilation on every gapfill request. The validation is now performed inside stripGapfill(PinotQuery) using the Thrift PinotQuery/Expression objects already in scope, with no additional compilation cost. The QueryContext-based validateGapfillColumns method is removed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
Fixes #9683
When a column appears in
TIMESERIESONbut is omitted from theSELECTlist,indexes.get(entityColum.getIdentifier())returnednulland auto-unboxing tointthrew a silentNullPointerExceptionatGapfillProcessor.process.Fix: Replace the bare unbox with an explicit null check. If a
TIMESERIESONcolumn is missing from the result schema, throwBadQueryRequestExceptionwith a descriptive message so callers get an actionable error instead of a stack trace.Changes
GapfillProcessor.java: null-checkindexes.get()result and throwBadQueryRequestExceptionwhen a TIMESERIESON column is absent from the SELECT listGapfillQueriesTest.java: add regression testtestGapfillThrowsOnTimeseriesOnColumnNotInSelectverifying the exception is raisedTest plan
GapfillQueriesTest#testGapfillThrowsOnTimeseriesOnColumnNotInSelect— new test that assertsBadQueryRequestExceptionis thrown for the failing queryGapfillQueriesTestcases continue to pass🤖 Generated with Claude Code