Phase 3B introduces the most delicate authorized expansion to the gcp-sql baseline: Controlled Joins. To prevent Cartesian explosion, catastrophic remote compute overruns, or complex nested logic masking data leakage, the surface is ruthlessly restricted to atomic, exact-match relational bridging.
Code will fail-closed if it deviates from this singular accepted structure:
- Single Join Only: A query may contain at most one
JOINoperation. INNER JOINOnly: Only explicitINNER JOIN(or defaultJOINassumingINNER) is permitted.- Equality Predicates Only: The
ONclause must consist of a strict equality check between explicit column references (e.g.,ON a.instrument_id = b.instrument_id). - No Join Chains: A single bridging of exactly two referenced tables (or self-aliases) is the absolute ceiling. Self-joins (e.g., joining a table to itself via distinct aliases like
MarketTick t1 JOIN MarketTick t2) are explicitly allowed under this subset, provided all other constraints are met.
The following constructs are strictly banned and must issue a deterministic TranspilationError:
- Outer & Cross Joins:
LEFT,RIGHT,FULL,OUTER, andCROSS(user-facing) joins. - Multiple Joins: Any attempt to chain joins (e.g.,
A JOIN B JOIN C). - Non-Equality Predicates: Range joins (
>), inequality (!=), or logic incorporating expressions/functions inside theONclause. - Implicit Relational Bridges:
NATURALjoins,USINGclauses, or comma-separatedFROM A, Bimplicit cross joins. - Mixed Complexity: No joins inside derived tables/subqueries, and no joins combined with subqueries or aggregations until independently proven safe.
- Correlated Equivalents: Emulating join-like behavior via correlated subqueries remains comprehensively banned under the Phase 3A specs.
Internal DuckDB artifacts previously accepted for bounded subquery execution — including CROSS_PRODUCT, internal HASH_JOIN, and internal first() lowering — do not constitute user-facing join authorization and may not be cited as evidence of Phase 3B support.
- Strict Transpiler Isolation:
test_internal_subquery_artifacts_explicitand all previous tests must remain fully passing without interference. - Corpus Validation: Both allowed (single inner equality) and rejected (multi-join, outer join, expression predicates) scenarios must be tracked as static
.sqlfixtures. - Live BigQuery Parity Proof: Execution must run over the actual network validating pyarrow schema translation for merged tabular output.
- No CLI Drift: All UI boundary outputs inside
test_gcp_cli_golden.pyremain pristine.