Skip to content

perf(query): narrow + materialize before group for WHERE + multi-key#228

Merged
ser-vasilich merged 1 commit into
masterfrom
perf/prefilter-multi-key
Jun 5, 2026
Merged

perf(query): narrow + materialize before group for WHERE + multi-key#228
ser-vasilich merged 1 commit into
masterfrom
perf/prefilter-multi-key

Conversation

@ser-vasilich
Copy link
Copy Markdown
Collaborator

Summary

The pre-filter narrowing path in ray_select_fn previously fired only
when the by-dict contained a computed val (e.g. q42's
(xbar EventTime ...)). For bare-ref multi-key shapes with a
selective WHERE — e.g. ClickBench q30 / q31 — the planner left the
work to the fused mk_par_v2 path. That path reads each by-key + agg
input column from the original wide table at the sparse positions
left by the WHERE bitmap. On the 100-column hits table with ~14%
selectivity, the gather wastes a cache line per touched column per
passing row.

Narrowing the input down to just the referenced columns and filtering
once gives the downstream group a dense column-store and skips the
gather.

Extend the gate to fire when:

  • the desc:/asc: COUNT take N shape matches,
  • by-vals are all bare column refs (no computed val),
  • the by-dict has ≥ 2 keys, and
  • at least one aggregate has an input column distinct from the
    by-keys (sum / min / max / avg, not pure count).

The count-only-on-a-by-key shapes (q14: count SearchPhrase by
{SearchEngineID, SearchPhrase}; q40: count URLHash by
{URLHash, EventDate}) reuse the by-key column for the count input
— narrowing costs the projection without saving the gather, so the
gate keeps them on the original fused path.

ClickBench 10M (on top of the gate from #227):

q30  ~152 →  ~50 ms   (-67%)
q31  ~353 →  ~55 ms   (-84%, -298ms)

Full 43-query sum (with #227 already in): -394 ms / -11.3%.

Tests: 3232/3234 pass (unchanged).

The pre-filter narrowing path in ray_select_fn previously fired only
when the by-dict contained a *computed* val (e.g. q42's
(xbar EventTime ...)).  For bare-ref multi-key shapes with a
selective WHERE — e.g. ClickBench q30 / q31:

  (select {c: (count ClientIP)
           s: (sum IsRefresh)
           a: (avg ResolutionWidth)
           from: hits where: (!= SearchPhrase "")
           by: {SearchEngineID: SearchEngineID ClientIP: ClientIP}
           desc: c take: 10})

the planner left the work to the fused mk_par_v2 path.  That path
reads each by-key + agg input column from the *original* wide table
at the sparse positions left by the WHERE bitmap.  On hits — 100+
columns — and ~14% selectivity, the gather wastes a cache line per
touched column per passing row.  Narrowing the input down to just
the referenced columns and filtering once gives the downstream group
a dense column-store and skips the gather.

Extend the gate to fire when:
  - the desc/asc COUNT take N shape matches,
  - by-vals are all bare column refs (no computed val),
  - the by-dict has ≥ 2 keys, and
  - at least one aggregate has an input column distinct from the
    by-keys (sum/min/max/avg, not pure count).

Count-only shapes (q14: count of SearchPhrase by
{SearchEngineID, SearchPhrase}; q40: count URLHash by
{URLHash, EventDate}) reuse the by-key column for the count input
— narrowing costs the projection without saving the gather, so the
gate keeps them on the original fused path.

ClickBench 10M:

  q30  ~152 →  ~50 ms
  q31  ~353 →  ~55 ms
@ser-vasilich ser-vasilich merged commit 5879294 into master Jun 5, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant