perf(query): narrow + materialize before group for WHERE + multi-key by ser-vasilich · Pull Request #228 · RayforceDB/rayforce

ser-vasilich · 2026-06-05T17:48:30Z

Summary

The pre-filter narrowing path in ray_select_fn previously fired only
when the by-dict contained a computed val (e.g. q42's
(xbar EventTime ...)). For bare-ref multi-key shapes with a
selective WHERE — e.g. ClickBench q30 / q31 — the planner left the
work to the fused mk_par_v2 path. That path reads each by-key + agg
input column from the original wide table at the sparse positions
left by the WHERE bitmap. On the 100-column hits table with ~14%
selectivity, the gather wastes a cache line per touched column per
passing row.

Narrowing the input down to just the referenced columns and filtering
once gives the downstream group a dense column-store and skips the
gather.

Extend the gate to fire when:

the desc:/asc: COUNT take N shape matches,
by-vals are all bare column refs (no computed val),
the by-dict has ≥ 2 keys, and
at least one aggregate has an input column distinct from the
by-keys (sum / min / max / avg, not pure count).

The count-only-on-a-by-key shapes (q14: count SearchPhrase by
{SearchEngineID, SearchPhrase}; q40: count URLHash by
{URLHash, EventDate}) reuse the by-key column for the count input
— narrowing costs the projection without saving the gather, so the
gate keeps them on the original fused path.

ClickBench 10M (on top of the gate from #227):

q30  ~152 →  ~50 ms   (-67%)
q31  ~353 →  ~55 ms   (-84%, -298ms)

Full 43-query sum (with #227 already in): -394 ms / -11.3%.

Tests: 3232/3234 pass (unchanged).

The pre-filter narrowing path in ray_select_fn previously fired only when the by-dict contained a *computed* val (e.g. q42's (xbar EventTime ...)). For bare-ref multi-key shapes with a selective WHERE — e.g. ClickBench q30 / q31: (select {c: (count ClientIP) s: (sum IsRefresh) a: (avg ResolutionWidth) from: hits where: (!= SearchPhrase "") by: {SearchEngineID: SearchEngineID ClientIP: ClientIP} desc: c take: 10}) the planner left the work to the fused mk_par_v2 path. That path reads each by-key + agg input column from the *original* wide table at the sparse positions left by the WHERE bitmap. On hits — 100+ columns — and ~14% selectivity, the gather wastes a cache line per touched column per passing row. Narrowing the input down to just the referenced columns and filtering once gives the downstream group a dense column-store and skips the gather. Extend the gate to fire when: - the desc/asc COUNT take N shape matches, - by-vals are all bare column refs (no computed val), - the by-dict has ≥ 2 keys, and - at least one aggregate has an input column distinct from the by-keys (sum/min/max/avg, not pure count). Count-only shapes (q14: count of SearchPhrase by {SearchEngineID, SearchPhrase}; q40: count URLHash by {URLHash, EventDate}) reuse the by-key column for the count input — narrowing costs the projection without saving the gather, so the gate keeps them on the original fused path. ClickBench 10M: q30 ~152 → ~50 ms q31 ~353 → ~55 ms

ser-vasilich merged commit 5879294 into master Jun 5, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(query): narrow + materialize before group for WHERE + multi-key#228

perf(query): narrow + materialize before group for WHERE + multi-key#228
ser-vasilich merged 1 commit into
masterfrom
perf/prefilter-multi-key

ser-vasilich commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ser-vasilich commented Jun 5, 2026

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant