fix: 9 engine bugs surfaced by a weak-assertion coverage audit#226
Merged
Conversation
…(wave 1)
Audit of a catastrophic false-positive-coverage anti-pattern: tests like
`(type (min ['to 'ab 'c])) -- 'sym` or `(count (select …)) -- N` mark a
function as "covered" in llvm-cov while never verifying its actual
result value — so a function returning wrong values would silently pass.
Five parallel agents swept five function categories and added ~290
value-level assertions across 60 RFL files, pairing each surviving
type/count check with a companion value assertion computed by hand from
the input fixture:
math/agg min/max/sum/avg/var/stddev/first/last/top/bot/abs/neg/
sqrt/log/exp/round/floor/ceil/pow (~70 asserts)
collection at/take/distinct/asc/desc/concat/union/except/find/
rank/group-keys (~50 asserts)
cast/serde as (de-tautologised ~40 (type (as 'T x))->'T), de/ser
round-trips, parse/eval, temporal ctors (~80 asserts)
query select/group/pivot/datalog/like/opt result values, not
just row counts (~50 asserts)
domain knn/ann row identities, joins (inner/left/anti/asof)
column values, graph algo results (~40 asserts)
No src/ bugs were found: every mismatch investigated turned out to be
documented k/q convention (sum widens narrow ints to i64; null is a
distinct least value so (!= 0N 0)=true, (< 0N 0)=true, (== 0N 0)=false).
The fused_group_parity (!= v 0) case is now asserted at its correct
q-convention value (sum 5) with an explanatory comment rather than an
xfail.
Suite: 3232 of 3234 pass (2 skipped, 0 failed) under ASan+UBSan.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…(wave 2)
Continue the false-positive-coverage audit. 90 RFL files had sites that
asserted only a function's result TYPE or row/group COUNT, never its
VALUE -- so llvm-cov marked the code "covered" while wrong values passed
silently. Each weak site now carries a hand-computed value check beside
(never replacing) the original type/count line.
Per-group results are verified order-independently via key-masked SOURCE
lookup (select {agg by: k from: T where: (== k <key>)}) because group
emission order is non-deterministic. Float results use tolerance.
Real engine bugs surfaced by the stronger checks are pinned LIVE as
xfail (correct value in a ;; xfail BUG: comment, current buggy value
asserted) so the suite stays green while the bug stays visible:
- I32 update memory corruption (non-selected rows clobbered)
- I32 typed-scalar WHERE reads atom as 0 -> wrong group (3 repros)
- parted STR/SYM all-pass filter returns un-materialized column
- negative-zero F64 pivot zero-fills value column / collapses key cols
- multi-key no-agg by-group ignores WHERE; drops non-key LIST column
- non-by scalar pearson_corr returns 0 instead of correlation
- narrow-int min/max wrong at group index >= 1
Full suite: 3232/3234 pass, 2 skipped, 0 failed (clean ASan/UBSan).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two group-by aggregation bugs surfaced by the weak-assertion audit. 1. Narrow-int min/max wrong past group index 0. The emit path stored results with a fixed 8-byte int64 write, overshooting a 4/2/1-byte I32/I16/U8 output column so every group after index 0 landed at the wrong offset (correct only at gi==0; single-group masked queries always correct). Store at the output element width via topk_write_i64 + ray_sym_elem_size, in both the DA and HT emitters. The all-null-group null sentinel is likewise widened (agg_int_null_sentinel_for) so it is not truncated to 0. 2. Non-by scalar pearson_corr returned integer 0. The n_keys==0 scalar fast-path lacked the binary-aggregator guard the DA path already has, so Pearson accumulated nothing and fell to the integer emit default. Skip the scalar fast-path when a binary aggregator is present, routing to the HT reducer that computes the cross-terms. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Three query-execution-path bugs surfaced by the weak-assertion audit.
1. I32 typed-atom mishandling (two symptoms, two root causes, both I32-only
because 4 != 8).
a. WHERE-branch UPDATE memory corruption. The merge loop read the source
column at a hardcoded 8-byte stride; for an I32 column (4-byte elems)
this mis-indexed and read past the end, clobbering the rows NOT selected
by WHERE -- e.g. update {v:0Ni from:t where:(==k 1i)} over k=[1 2 3]
v=[10 20 30] yielded v=[null 30 garbage] instead of [null 20 30]. Stride
by ray_elem_size(ct). (query.c)
b. Typed I32 scalar in a WHERE filter read as 0. The fused-expr unary CAST
I32->I64 branch had no I32 source case and fell into the f64 fallback,
reinterpreting the 4-byte int as a double (small values -> 0), so
select {from:t where:(== k (as 'I32 7))} selected the k=0 group. Add
I32/I16/DATE/TIME source arms. (expr.c)
2. Multi-key no-agg by-group ignored WHERE. select {by:[k1 k2] from:t where:...}
with no aggregate routed to a computed-key fallback that re-evaluated the
by SYM-vector as a literal symbol list, grouping the UNFILTERED columns
(one group per key name, not per distinct composite value). Route the
n_out==0 multi-key case to eval-level grouping (mirroring the existing GUID
case), which groups the filtered rows by the real composite key. (query.c)
Follow-up: the eval-group path returns only key columns for this shape; the
first-of-group non-key columns are carried by the next commit.
3. Non-key LIST column dropped under WHERE. sel_compact allocated result
columns with typed_vec_new, but RAY_LIST==0 is rejected by ray_vec_new, so
the column was left NULL and silently dropped (INT columns survived; LIST-
specific) -- select {by:k from:t where:...} lost a non-key LIST column.
Allocate LIST via ray_list_new and gather its element pointers with a
retain, excluding LIST from the non-retaining byte-copy gather paths to
avoid an alias/double-free. (filter.c)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
An all-pass filter on a splayed/parted table (e.g. where: (like s "*"), or a select with no where:) hit the no-selection short-circuit and returned the source table verbatim, leaving its columns in segmented (RAY_IS_PARTED) / RAY_MAPCOMMON form. A subsequent (at result 'col) then raised a type error, while a partial filter (non-NULL selection) flattened correctly via sel_compact -- the defect was specific to the no-selection finalization path. Flatten residual parted/mapcommon columns at finalization into a freshly built table (never mutating the possibly-stored source), matching the partial-filter path. Reuses the OP_SCAN cold-path flatten via a new flatten_parted_col helper. Flat in-memory tables take the zero-copy short-circuit unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Completes the multi-key no-agg WHERE fix. After routing
select {by:[k1 k2] from:t} (no aggregate) to the eval-group path, the result
carried only the key columns -- the single-key path returns the first-of-group
value of every non-key column, so the multi-key shape silently dropped them
(e.g. column 'val).
Gather the first-of-group value of each non-key column for the n_out==0
multi-key case, mirroring the single-key first-of-group gather (typed
STR/LIST/scalar copy preserving native type and nulls), rather than
ray_at_fn which wraps a vector index into a LIST. select {by:[k1 k2] from:t}
now returns the key columns plus native-typed first-of-group non-key columns,
matching select {by:k1 from:t}.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CI (macos + ubuntu) surfaced three audit assertions that pinned a specific emission order which is actually platform-dependent: - db_mount `(key M)` / `(key Mp)`: a directory-mount dict is keyed by table names in filesystem readdir order, which differs across platforms (and from the local dev box). Compare the sorted set of names. - MST edge order: Kruskal's emit order tie-breaks equal-weight edges via an unstable weight sort, so the order (and, for a 3-cycle with unit weights, the very edge SET) varies by platform. graph_basic / traverse_branch_cov: assert sorted edge columns + total weight where the MST set is unique; drop the per-edge pins where the set itself is non-unique (keep count + total weight). - shortest-path: several equal-length 0->5 routes exist; which BFS returns depends on adjacency iteration order. Assert node count, monotonic per-hop depth, and the fixed endpoints, not the specific intermediate nodes. No engine behavior change; only the test assertions are made robust. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A pivot whose F64 KEY column holds both 0.0 and -0.0 split them into two groups: the group-by equality key compares raw bits (0x8000.. != 0x0) while the column name normalizes via clear_neg_zero, so two same-named "0" columns were emitted (and the value column zero-filled in the lambda fallback). Normalize the pivot KEY column once, up front, in ray_pivot_fn (a thin wrapper around the now-static pivot_fn_impl): if the F64 key holds any -0.0, build a shallow table copy whose pivot column has -0.0 collapsed to +0.0 via clear_neg_zero, then run the pivot on it. Both the DAG fast path and the generic lambda fallback then group 0.0/-0.0 together. The merge must happen BEFORE grouping (so (index,0.0) and (index,-0.0) land in one group and sum); normalizing after the group-by would overwrite, not merge. Cold path only: one O(n) bit-check scan of the pivot key (early-exit when no -0.0), and a table copy solely when -0.0 is present. The group-by per-row hot loop and its -fno-signed-zeros vectorization are untouched. clear_neg_zero is an integer bit-check, so 0.0/-0.0 collapse identically across linux/macos and debug/release. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
be28802 to
33f0b46
Compare
…prox ray_count_distinct_approx allocates one HLL shard per worker via scratch_calloc (zero-initialized: regs==NULL, m==0) and merges shards[1..nw) into shards[0] before estimating. Under dynamic worker scheduling a thread — including the one that owns shards[0] — may process no chunk, leaving shards[0] zero-initialized. ray_hll_merge(dst=shards[0], src) then hits its precision guard (dst->m 0 != src->m) and returns without merging, so every merge is silently skipped and the estimate falls to 0 (e.g. "RAY_I16 big: got 0, expected ~250"). This is non-deterministic — it only fires when the shards[0]-owning worker happens to stay idle — which is why it surfaced as an intermittent CI failure on macos-release. Merge into the first shard that actually saw data (the first with regs != NULL) and estimate from that, instead of assuming shards[0] is initialized. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR fixes
Eight real engine bugs, each previously hidden behind type/count-only test
assertions (a function's result type or row/group count was checked, never
its value — so llvm-cov marked the code "covered" while wrong values passed
silently). Severity-ordered:
updatestrode the source column at a hardcoded 8-byte widthupdate {v:0Ni from:t where:(==k 1i)}overv=[10 20 30]→[null 30 garbage])-0.0→+0.0"0"doubleselect {from:t where:(== k (as 'I32 7))}selected thek=0groupmin/maxstored at int64 widthpearson_corrlacked the binary-aggregator guard0instead of the correlationbyre-evaluated thebySYM-vector as a literalsel_compactallocatedRAY_LIST(==0) viatyped_vec_newtypeerror on(at … 'col)count_distinct_approxmerged worker HLL shards intoshards[0], which a dynamically-scheduled idle worker can leave zero-initializedCommits
fix(group)— pivot-0.0grouping (parse_str() segfaults for certain inputs #2), narrow-int min/max (Byte and byte vector literals not implemented #4), scalar pearson (UB warning when a table is being constructed #5)fix(query)— I32 update corruption + typed-scalar WHERE (Update README.md #1, Converting a string to a symbol may add random garbage #3), multi-key no-agg WHERE (Segfault in binary_call_atomic #6), LIST drop (Error when parsing char literals #7)fix(query)— carry non-key first-of-group columns in multi-key no-agg by-group (completes Segfault in binary_call_atomic #6)fix(exec)— materialize parted columns on all-pass / no-selection filter (Casting an empty list to a symbol vector does not work #8)How they were found
A weak-assertion audit (the two
test:commits) replaced type/count-onlyassertions with real value checks across ~92 RFL files. Where a stronger
assertion exposed a real engine bug, the correct value is kept in a
;; xfail BUG:comment with the buggy value pinned live, so the bug staysvisible while the suite stays green — then resolved here, flipping each xfail
back to the correct value as its fix lands.
Verification
src/change without a value-check that fails before the fix and passes after.Coverage (src only): 89% lines / 85% regions overall; the five touched files 85–96% lines.
🤖 Generated with Claude Code