Skip to content

fix(tesseract): tautological FK-aggregate join for filtered PK counts#10762

Open
tlangton3 wants to merge 5 commits intocube-js:masterfrom
tlangton3:cube/fix-tesseract-fk-aggregate-pk-tautology
Open

fix(tesseract): tautological FK-aggregate join for filtered PK counts#10762
tlangton3 wants to merge 5 commits intocube-js:masterfrom
tlangton3:cube/fix-tesseract-fk-aggregate-pk-tautology

Conversation

@tlangton3
Copy link
Copy Markdown
Contributor

@tlangton3 tlangton3 commented Apr 28, 2026

Summary

Filtered count measures of shape type: count + sql: "{primary_key}" + filters: [...], when reached through a 1→N (one-to-many) join with CUBESQL_SQL_PUSH_DOWN=true, generated SQL with a tautological FK-aggregate rejoin: LEFT JOIN <source_cube> AS <pk_alias> ON keys.pk = keys.pk. Both sides referenced the inner keys subquery, so the join cross-produced and the count over-included any row whose filter predicates were satisfied anywhere in the source table — silently incorrect and orders of magnitude slower.

The legacy JS path (BaseQuery.js::aggregateSubQuery) generates the correct keys.pk = source.pk join. The bug is exclusive to the Tesseract Rust planner.

Discovered as a production over-count in a downstream Cube deployment: a measure pattern matching ~70 instances across the model returned silently inflated counts whenever queried through a fanout-prone view. Reproduced into the upstream test suite and fixed here.

Changes

  • rust/cubesqlplanner/cubesqlplanner/src/physical_plan_builder/processors/aggregate_multiplied_subquery.rs — bind a dedicated VisitorContext to the right-hand side of the FK-aggregate ON clause in the Cube source arm.
  • rust/cubesqlplanner/cubesqlplanner/src/test_fixtures/schemas/yaml_files/common/integration_multi_fact.yaml — add customers.active_count (filtered count on PK) to the existing 1→N schema.
  • rust/cubesqlplanner/cubesqlplanner/src/tests/integration/multi_fact.rs — new test_multiplied_aggregate_filtered_count_on_pk regression test with a structural tautology assertion over every ON predicate, plus the suite-standard assert_snapshot!(result) on postgres execution row data when integration-postgres is on.

Implementation Details

The Cube source arm of AggregateMultipliedSubquery::process used Expr::Member(dim) for the join's right-hand side, deferring resolution to SQL-emission time. By then, the outer factory's render_references had been populated by the schema-resolution pass — and because the keys subquery (the join's root) projects the PK dimension, ReferencesBuilder::find_reference_column_for_member_in_join resolved it to (keys, <alias>). That mapping intercepted both sides of the ON clause via RenderReferencesSqlNode, producing the tautology.

Fix: snapshot a VisitorContext before the schema-resolution pollution, register pk_cube.name() → pk_cube_alias in cube_name_references, and bind it to the RHS via Expr::new_member_with_context. The dim then renders via AutoPrefixSqlNode against pk_cube_alias — preserves full SQL semantics for custom-SQL or composite PKs, not just simple sql: id cases.

Risk / Compatibility

The change is scoped to the Cube arm of AggregateMultipliedSubquerySource. The MeasureSubquery arm is unchanged — it already used Expr::Reference with pk_cube_alias and was unaffected. No other call sites are at risk: add_subquery_join (builder.rs:141) uses the same Expr::new_member pattern but joins against a Cube-rooted From where ReferencesBuilder returns None for cube sources, so it doesn't trigger the render-reference pollution that caused the bug.

Testing

  • cargo test --lib in rust/cubesqlplanner/: 823 tests pass locally, 0 failures, 14 ignored (Docker-dependent).
  • Pre-fix manifestation: LEFT JOIN customers AS "..." ON (("keys"."customers__id" = "keys"."customers__id")) — tautology, both sides reference keys.
  • Post-fix: LEFT JOIN customers AS "customers_key_customers" ON (("keys"."customers__id" = "customers_key_customers".id)) — RHS correctly resolves to the source-cube alias.
  • Regression coverage: (a) the test's structural tautology scanner walks every ON predicate and fails on any lhs == rhs equality (environment-independent, robust to planner formatting); (b) assert_snapshot!(result) on postgres execution row data when integration-postgres is enabled validates end-to-end correctness against a real database. SQL strings themselves are not snapshotted — that follows the existing convention in the suite (226 result-data snapshots vs 0 SQL-string snapshots), since SQL formatting is unstable across template-rendering passes while row data is stable.

Linear Ticket

N/A — upstream OSS contribution.

When a filtered count measure of shape `type: count` + `sql: "{primary_key}"` + `filters: [...]` is reached through a 1->N edge with `CUBESQL_SQL_PUSH_DOWN=true`, the FK-aggregate rejoin onto the source cube emitted `ON keys.pk = keys.pk` — both sides referencing the inner `keys` subquery. The join cross-produced and the count over-included any row whose filters were satisfied anywhere in the source table.

Root cause: the `Cube` arm of `AggregateMultipliedSubquerySource` built the right-hand side of the join condition as `Expr::Member(MemberExpression::new(dim))`, deferring resolution to whatever rendering context was active at SQL emission. By emission time, the outer factory's `render_references` had been populated by the schema-resolution pass, mapping the PK dim to `(keys, <alias>)` (since the `keys` subquery — the join root — projects it). That mapping then intercepted both sides of the `ON` clause via `RenderReferencesSqlNode`.

Fix: clone `context_factory`, register `pk_cube.name() -> pk_cube_alias` in `cube_name_references`, and bind a `VisitorContext` snapshot (pre-pollution) to the join's right-hand side via `Expr::new_member_with_context`. The dimension renders through `AutoPrefixSqlNode` against `pk_cube_alias`, preserving full SQL semantics for custom-SQL or composite primary keys. The `MeasureSubquery` arm in the same file already used an explicit reference to `pk_cube_alias` and was unaffected.

Adds a regression test exercising the bug shape (`customers.active_count` filtered count on PK, queried by `orders.status` to force the multiplied path) with both a structural tautology assertion over every `ON` predicate and an `insta` snapshot of the generated SQL.
@github-actions github-actions Bot added rust Pull requests that update Rust code pr:community Contribution from Cube.js community members. labels Apr 28, 2026
The SQL string captured by `build_sql(..)` differs between local and CI
environments: when `--features integration-postgres` is on and Docker is
available, `try_execute_pg` runs a second template-rendering pass that
emits CTE-wrapped SQL; the single-pass `build_sql` output is the
inline-subquery form. Same code, different rendering pipelines reachable.

The cubesqlplanner suite convention is to snapshot postgres execution
results (226 such files in the repo), not SQL strings (1 file — this one).
Drop the SQL snapshot to follow convention. The structural tautology
scanner above stays — it's the bug-specific, environment-independent
guard. The existing `assert_snapshot!(result)` on row data continues to
verify end-to-end correctness when CI's postgres container is up.
Adds the canonical row-data snapshot the regression test needs when
`integration-postgres` is enabled. Validates that the filtered count
on a primary key produces the correct distinct-customer counts (1 for
'completed', 1 for 'pending', 0 for orders.status NULL) — pre-fix the
buggy cross-join over-counted these.
The `active_count` measure was defined with `sql: "{CUBE}.id"` and
filter `sql: "{CUBE}.name LIKE 'A%'"`. Both forms parse to
`SqlDependency::CubeRef` only — no `SqlDependency::Symbol` on the `id`
or `name` dimensions. The FK-aggregate bug only manifests when the
measure transitively pulls a primary-key dimension into the outer
factory's `render_references` (via a Symbol dep), which is what makes
the join condition's RHS misresolve to the inner `keys` subquery.

Switch to `sql: "{id}"` and filter `sql: "{name} LIKE 'A%'"` — these
are member references that produce Symbol deps on the dimensions, so
`references_builder.resolve_references_for_member` recursively
populates render_references for the PK. This matches the original
production bug shape (`sql: "{primary_key}"` with filters) and
genuinely exercises the buggy code path. With the fix in place the
join condition correctly resolves to `keys.<pk> = pk_cube_alias.<col>`;
without the fix it would collapse to `keys.<pk> = keys.<pk>`.
@tlangton3 tlangton3 marked this pull request as ready for review April 28, 2026 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr:community Contribution from Cube.js community members. rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant