[VL] Skip escape arg when offloading Like with no-backslash pattern#12152
Open
zhulipeng wants to merge 1 commit into
Open
[VL] Skip escape arg when offloading Like with no-backslash pattern#12152zhulipeng wants to merge 1 commit into
zhulipeng wants to merge 1 commit into
Conversation
Spark's Like node always carries escapeChar (defaulting to '\') even when the SQL did not specify ESCAPE. Always sending the 3-arg form to Velox forces makeLike (Re2Functions.cpp) onto the escape-aware path: parsePattern runs an extra unescape pass and determinePatternKind runs with escapeChar.has_value() == true, even when no escaping is needed. When the pattern literal contains no '\', the 2-arg and 3-arg forms are semantically identical, so emit the cheaper 2-arg form. Velox already registers both signatures via likeSignatures(). Performance: TPC-H Q13 @ 6 TB shows >6% end-to-end latency reduction. With the 3-arg form, the constant-pattern fast paths in determinePatternKind are bypassed and Velox falls back to LikeWithRe2, hot-looping in re2::DFA::InlinedSearchLoop (>8% of total cycles on Q13). Sending the 2-arg form lets determinePatternKind dispatch '%special%requests%' to OptimizedLike<kSubstrings>, eliminating the RE2 DFA cost. Generated-by: Claude claude-opus-4-7 Co-authored-by: Guo Wangyang <wangyang.guo@intel.com> Co-authored-by: Hengrui Hu <hengrui.hu@intel.com> Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes are proposed in this pull request?
When offloading
Liketo Velox, omit the escape literal argument and emitthe 2-arg form (
like(input, pattern)) when both:Literal, and\AND the pattern does not contain\.Otherwise we still emit the 3-arg form (
like(input, pattern, escape)) as before.Why are the changes needed?
Spark's
Likenode always carries anescapeChar(defaulting to\) even whenthe SQL did not specify
ESCAPE. We previously always sent the 3-arg form toVelox, which forces Velox's
makeLike(Re2Functions.cpp) to take theescape-aware path:
parsePatternruns an extra unescape pass, anddeterminePatternKindruns withescapeChar.has_value() == true, even whenno actual escaping is needed.
When the pattern literal contains no
\, the 2-arg and 3-arg forms aresemantically identical, so we can safely send the cheaper 2-arg form. Velox
already registers both signatures via
likeSignatures().How was this patch tested?
Likequery coverage inVeloxStringFunctionsSuite(like/rlike/ilike) — query results unchanged../dev/format-scala-code.shclean.Performance
TPC-H Q13 @ 6 TB scale: >6% end-to-end latency reduction.
Q13's
l_comment NOT LIKE '%special%requests%'filter scans every lineitem row.With the 3-arg form, the constant-pattern fast paths in
determinePatternKindare bypassed and Velox falls back to the generic
LikeWithRe2path, whichhot-loops in
re2::DFA::InlinedSearchLoop— CPU profiling showsInlinedSearchLoopaccounts for >8% of total cycles on Q13.Sending the 2-arg form lets
determinePatternKindrecognize this as thekSubstringsshape and dispatch to the dedicatedOptimizedLike<kSubstrings>kernel, eliminating the RE2 DFA cost. No regression observed on other queries.
Was this patch authored or co-authored using generative AI tooling?
Reviewed-by: Claude claude-opus-4-7