Improve FSST LIKE contains handling#8573
Conversation
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Merging this PR will degrade performance by 36.13%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | fsst_contains[path] |
5 ms | 10.9 ms | -54.35% |
| ❌ | Simulation | fsst_contains[email] |
4.4 ms | 9.5 ms | -53.28% |
| ❌ | Simulation | like_substr_high_match |
8.1 ms | 16.6 ms | -51% |
| ❌ | Simulation | fsst_contains[json] |
13.9 ms | 28.3 ms | -50.99% |
| ❌ | Simulation | fsst_contains[cb] |
15.5 ms | 30.4 ms | -49.12% |
| ❌ | Simulation | chunked_bool_canonical_into[(1000, 10)] |
16.3 µs | 26.9 µs | -39.46% |
| ❌ | Simulation | fsst_contains[log] |
25.4 ms | 40.5 ms | -37.35% |
| ❌ | Simulation | fsst_contains[urls] |
10.4 ms | 15 ms | -30.87% |
| ❌ | Simulation | slice_empty_vortex |
310 ns | 368.3 ns | -15.84% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[128] |
244.4 ns | 215.3 ns | +13.55% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[1024] |
304.7 ns | 275.6 ns | +10.58% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing ngates/fsst-like-pushdown (d15418f) with develop (2a19323)
Footnotes
-
4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Rational for this change
FSST's compressed DFA path is not always the fastest path for short
%needle%LIKE patterns. For short substring needles, decoding the FSST array once and using the existing canonical LIKE implementation is faster than walking the compressed byte stream through the DFA.This also makes
Sharedtransparent to parent reductions so FSST parent kernels are still reached when arrays are wrapped by the shared cache layer.What changes are included in this PR?
Short contains-style FSST LIKE patterns now fall back to canonicalized LIKE evaluation. The FSST
LikeKindparser is made crate-visible so the LIKE kernel can choose that path before constructing the DFA matcher.The PR also adds a regression test covering FSST parent kernels through
SharedArray.What APIs are changed? Are there any user-facing changes?
No public API changes. The behavior change is internal query execution: matching results are unchanged, but some FSST LIKE predicates should execute faster.