CySQL Query Optimizer Framework#88
Conversation
Late-materialize variable-length path edge arrays from compact expansion path ID arrays. Avoid carrying expansion edge composites through intermediate projections unless the relationship binding itself remains observable.
Represent fixed relationships used only by path bindings as scalar edge IDs through intermediate projections. Reconstruct edge composites from those IDs only when path values or path relationship arrays are needed, while keeping directly referenced relationship variables materialized as full composites.
WalkthroughAdds a PG Cypher optimizer (analysis, rules, lowering) and integrates its decisions into translation. Introduces new AST nodes/utilities, regenerates SQL fixtures, expands integration tests, and enhances the benchmark tool to capture PostgreSQL EXPLAIN with optimizer metadata. ChangesBenchmark EXPLAIN capture and reporting
Cypher AST deep-copy and helpers
PGSQL AST additions, formatting, walking, and renaming
Optimizer core: analysis, locality, lowering plan, rules, and tests
Translator integration with optimizer decisions and new flows
SQL translation test fixtures regenerated
Integration tests, data fixtures, and metrics allowlist
Sequence Diagram(s)sequenceDiagram
participant Client
participant Optimizer
participant Translator
participant PostgreSQL
Client->>Optimizer: Optimize(cypher)
Optimizer-->>Translator: Plan (rules, attachments, lowerings)
Translator->>PostgreSQL: Execute generated SQL
PostgreSQL-->>Translator: Rows/Plan
Translator-->>Client: Results + Optimization summary
Estimated code review effort🎯 5 (Critical) | ⏱️ ~90+ minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (2)
cypher/models/pgsql/translate/tracking.go (1)
347-353: ⚡ Quick winMake
LookupDataTypealias-aware.This helper goes straight to
Lookup, soLookupDataType(alias)returnsfalsefor identifiers thatAliasedLookup/LookupStringalready resolve. That makes the new exported API easy to misuse anywhere aliased bindings are visible.Possible fix
func (s *Scope) LookupDataType(identifier pgsql.Identifier) (pgsql.DataType, bool) { - if binding, bound := s.Lookup(identifier); !bound { - return "", false - } else { + if binding, bound := s.Lookup(identifier); bound { + return binding.DataType, true + } + + if binding, bound := s.AliasedLookup(identifier); bound { return binding.DataType, true } + + return "", false }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cypher/models/pgsql/translate/tracking.go` around lines 347 - 353, LookupDataType currently calls s.Lookup directly and thus fails for aliased identifiers; change it to resolve aliases first (use s.AliasedLookup(identifier) or s.LookupString on the identifier string equivalent) and return the resolved binding's DataType and true when found, otherwise return "" and false—i.e., replace the s.Lookup call in LookupDataType with an alias-aware lookup using AliasedLookup/LookupString and return binding.DataType, true when present.integration/cypher_test.go (1)
235-237: Verify integration execution is scoped byCONNECTION_STRING.Before running
make test_allfor these integration additions, ensureCONNECTION_STRINGis set and that only the backend selected by its scheme is exercised while other backend tests self-skip.
As per coding guidelines, "Integration suites must consumeCONNECTION_STRINGenvironment variable. IfCONNECTION_STRINGis not present in the LLM context, ask the user to add it before runningmake test_all" and "Runmake test_allonly for the backend selected by the scheme inCONNECTION_STRING."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration/cypher_test.go` around lines 235 - 237, The integration tests must be gated by the CONNECTION_STRING environment setting: add a setup check (preferably in the test package's TestMain or the top-level test runner that invokes the switch handling cases like the one that calls assertRelationshipListKinds and decodeAssertionValue) that reads CONNECTION_STRING, if it's missing call t.Skipf or return an error instructing the user to set CONNECTION_STRING before running make test_all, and if present parse the URI scheme (text before "://") and use that value to only run tests for the selected backend (skip other backend-specific tests with t.Skipf when the scheme does not match); ensure this gating is applied globally so each case (e.g., relationship_list_kinds) self-skips when the scheme is not applicable.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cypher/models/pgsql/optimize/locality.go`:
- Around line 111-145: The function SelectReferencesOnlyLocalIdentifiers
currently seeds all from/join aliases up front via AddFromClauseBindings, which
lets an ON constraint see later join aliases as local; change the algorithm to
validate JOIN constraints against an incremental scope instead: copy localScope
into scopedIdentifiers, call AddFromClauseBindings only for the from clause
source (or add the source alias binding) then for each join in fromClause.Joins
validate join.Table with FromExpressionReferencesOnlyLocalIdentifiers and
validate join.JoinOperator.Constraint with
ExpressionReferencesOnlyLocalIdentifiers using the current scopedIdentifiers,
and only after those checks add the join's alias/bindings into scopedIdentifiers
(mirroring how AddFromClauseBindings would add them) so later joins see earlier
aliases but not vice versa.
In `@cypher/models/pgsql/optimize/lowering_plan.go`:
- Around line 611-640: The loop that appends LimitPushdownDecision entries is
missing a guard to skip OPTIONAL MATCH clauses; update the loop over
readingClauses in lowering_plan.go to check the clause's optionality (e.g.
readingClause.Optional or readingClause.OptionalMatch) right after the existing
nil/Match nil checks and continue when true so no LimitPushdown entries are
added for optional matches; keep the existing logic that computes patternTarget,
traversalStepsForPattern,
LimitPushdownTraversalCTE/LimitPushdownShortestPathHarness, and appends to
plan.LimitPushdown unchanged for non-optional clauses.
In `@cypher/models/pgsql/optimize/selectivity.go`:
- Around line 112-117: popSelectivity currently calls Selectivity(), which
returns the root frame (s.selectivityStack[0]), causing pops to return the wrong
value; update popSelectivity in measureSelectivityVisitor to read the top frame
directly (value := s.selectivityStack[len(s.selectivityStack)-1]) before slicing
the stack down (s.selectivityStack = s.selectivityStack[:len-1]) so it returns
the top-most frame for nested scopes and NOT(...) computations instead of the
root.
In `@cypher/models/pgsql/optimize/source_references.go`:
- Around line 85-87: The branch handling *cypher.Variable in
source_references.go incorrectly treats declaration-context variables as
references; update the case in the switch so that it calls
s.addVariable(typedNode) only when matchPatternDeclarationDepth == 0 (i.e., the
variable is not inside a match pattern declaration). In other words, guard the
addVariable call with a check of s.matchPatternDeclarationDepth (or the named
variable/method used for that depth) and skip marking variables as referenced
when inside a declaration context.
In `@cypher/models/pgsql/test/translation_cases/pattern_binding.sql`:
- Around line 89-90: The query recomputes ordered_edges_to_path(...) multiple
times when evaluating NONE over TAIL(TAIL(NODES(p))), causing repeated edge
scans; refactor by materializing the path once in an inner CTE (stage the
matched path as p or a CTE column e.g. p_path / ep0 inside s0/s1) using
ordered_edges_to_path(...) and expose its (p).nodes as a column, then in the
outer query reference that materialized p.nodes for the NONE(...) /
TAIL(TAIL(...)) predicate instead of re-invoking ordered_edges_to_path(...)
repeatedly.
In `@cypher/models/pgsql/translate/expansion.go`:
- Around line 2304-2306: The UNWIND sources collected via
unwindClauses/unwindFromClauses are only attached in buildExpansionPatternRoot
and are not propagated for shortest-path roots built by
buildShortestPathsExpansionPattern/Build*ShortestPathsRoot, causing UNWIND ...
MATCH p = shortestPath(...) to lose row sources/bindings; refactor so the
unwindSources plumbing is applied in the shared root-builder path used by both
buildExpansionPatternRoot and
buildShortestPathsExpansionPattern/Build*ShortestPathsRoot (move the attach
logic into the common root construction function), ensure seed
predicates/binding include the unwindSources there, and add a regression test
exercising UNWIND ... MATCH p = shortestPath(...) to validate the fix.
In `@cypher/models/pgsql/translate/model.go`:
- Around line 170-181: The guard in canMaterializeEndpointPairFilterForStep
currently uses hasLocalEndpointConstraint which allows label-only predicates;
change both endpoint checks to require pair-aware predicates by calling
hasPairAwareEndpointConstraint (or equivalent search-column check) for
expansionModel.PrimerNodeConstraints against traversalStep.LeftNode.Identifier
and expansionModel.TerminalNodeConstraints against
traversalStep.RightNode.Identifier so materializedEndpointPairFilterStatement
only runs when both endpoints supply pair-aware constraints and avoid
enumerating explosive cross-products.
In `@cypher/models/pgsql/translate/predicate.go`:
- Around line 136-149: The current code returns from buildPatternPredicates
after handling one optimized existence-placement predicate (inside the block
that calls s.usePatternPredicateExistencePlacement and
s.buildOptimizedRelationshipExistPredicate), which prevents later predicates in
the same patternPart from being processed; instead, after assigning
predicateFuture.SyntaxNode and calling
s.recordLowering(optimize.LoweringPredicatePlacement) do not return nil—skip to
the next predicate in the loop (e.g., replace the early return with a continue
or otherwise let the loop continue) so remaining predicates are built and
translated; keep the existing calls to s.usePatternPredicateExistencePlacement,
s.buildOptimizedRelationshipExistPredicate, predicateFuture.SyntaxNode
assignment, and s.recordLowering as-is.
---
Nitpick comments:
In `@cypher/models/pgsql/translate/tracking.go`:
- Around line 347-353: LookupDataType currently calls s.Lookup directly and thus
fails for aliased identifiers; change it to resolve aliases first (use
s.AliasedLookup(identifier) or s.LookupString on the identifier string
equivalent) and return the resolved binding's DataType and true when found,
otherwise return "" and false—i.e., replace the s.Lookup call in LookupDataType
with an alias-aware lookup using AliasedLookup/LookupString and return
binding.DataType, true when present.
In `@integration/cypher_test.go`:
- Around line 235-237: The integration tests must be gated by the
CONNECTION_STRING environment setting: add a setup check (preferably in the test
package's TestMain or the top-level test runner that invokes the switch handling
cases like the one that calls assertRelationshipListKinds and
decodeAssertionValue) that reads CONNECTION_STRING, if it's missing call t.Skipf
or return an error instructing the user to set CONNECTION_STRING before running
make test_all, and if present parse the URI scheme (text before "://") and use
that value to only run tests for the selected backend (skip other
backend-specific tests with t.Skipf when the scheme does not match); ensure this
gating is applied globally so each case (e.g., relationship_list_kinds)
self-skips when the scheme is not applicable.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 04b35b30-d9d0-4fc9-9ae1-a1cd8ffecbf5
📒 Files selected for processing (66)
cmd/benchmark/README.mdcmd/benchmark/explain.gocmd/benchmark/main.gocmd/benchmark/report.gocmd/benchmark/report_test.gocmd/benchmark/runner.gocmd/benchmark/scenarios.gocypher/models/cypher/copy.gocypher/models/cypher/copy_test.gocypher/models/cypher/model.gocypher/models/pgsql/format/format.gocypher/models/pgsql/model.gocypher/models/pgsql/optimize/analysis.gocypher/models/pgsql/optimize/analysis_test.gocypher/models/pgsql/optimize/locality.gocypher/models/pgsql/optimize/lowering.gocypher/models/pgsql/optimize/lowering_plan.gocypher/models/pgsql/optimize/optimizer.gocypher/models/pgsql/optimize/optimizer_test.gocypher/models/pgsql/optimize/pattern_predicates.gocypher/models/pgsql/optimize/reordering.gocypher/models/pgsql/optimize/selectivity.gocypher/models/pgsql/optimize/source_references.gocypher/models/pgsql/pgtypes.gocypher/models/pgsql/test/translation_cases/create.sqlcypher/models/pgsql/test/translation_cases/delete.sqlcypher/models/pgsql/test/translation_cases/multipart.sqlcypher/models/pgsql/test/translation_cases/nodes.sqlcypher/models/pgsql/test/translation_cases/pattern_binding.sqlcypher/models/pgsql/test/translation_cases/pattern_expansion.sqlcypher/models/pgsql/test/translation_cases/pattern_rewriting.sqlcypher/models/pgsql/test/translation_cases/quantifiers.sqlcypher/models/pgsql/test/translation_cases/scalar_aggregation.sqlcypher/models/pgsql/test/translation_cases/shortest_paths.sqlcypher/models/pgsql/test/translation_cases/stepwise_traversal.sqlcypher/models/pgsql/test/translation_cases/update.sqlcypher/models/pgsql/translate/constraints.gocypher/models/pgsql/translate/expansion.gocypher/models/pgsql/translate/function.gocypher/models/pgsql/translate/limit_pushdown_test.gocypher/models/pgsql/translate/match.gocypher/models/pgsql/translate/model.gocypher/models/pgsql/translate/optimizer_safety_test.gocypher/models/pgsql/translate/path_functions.gocypher/models/pgsql/translate/pattern.gocypher/models/pgsql/translate/predicate.gocypher/models/pgsql/translate/projection.gocypher/models/pgsql/translate/renamer.gocypher/models/pgsql/translate/renamer_test.gocypher/models/pgsql/translate/selectivity.gocypher/models/pgsql/translate/tracking.gocypher/models/pgsql/translate/translator.gocypher/models/pgsql/translate/traversal.gocypher/models/pgsql/translate/with.gocypher/models/walk/walk_pgsql.godocs/optimization-pass-memory.mdintegration/cypher_test.gointegration/testdata/adcs_fanout.jsonintegration/testdata/cases/aggregation_inline.jsonintegration/testdata/cases/expansion_inline.jsonintegration/testdata/cases/multipart_inline.jsonintegration/testdata/cases/optimizer_inline.jsonintegration/testdata/cases/pattern_binding_inline.jsonintegration/testdata/cases/unwind_inline.jsonintegration/testdata/templates/pattern_shapes.jsontools/metrics/internal/metrics/quality.go
💤 Files with no reviewable changes (1)
- cypher/models/pgsql/test/translation_cases/pattern_rewriting.sql
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cypher/models/pgsql/translate/expansion.go (1)
1212-1235:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftDon't cross-join prior UNWIND rows into recursive fronts without a row key.
The recursive front rows here only preserve traversal state, so once
appendUnwindSourcesIfReferencedprepends the previous frame and UNWIND sources, every frontier row is evaluated against every matching outer UNWIND row. Two outer rows that share the same endpoints but differ only by the UNWIND-bound value can now leak or duplicate paths across each other. This needs an outer-row discriminator threaded through the front/harness state, or recursive UNWIND-dependent constraints should stay unsupported for now.Also applies to: 1286-1289, 1388-1403, 1447-1450
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cypher/models/pgsql/translate/expansion.go` around lines 1212 - 1235, The recursive frontier projection (nextQuery.Projection) is missing an outer-row discriminator so UNWIND/outer rows will be cross-joined into recursive fronts; fix by threading a unique outer-row key through the front and harness state: add a dedicated discriminator column (e.g., outerRowID or unwindRowKey) to the projection alongside expansionModel.Frame.Binding.Identifier, expansionRootID, expansionDepth and expansionPath, ensure appendUnwindSourcesIfReferenced populates that discriminator when prepending previous frame/UNWIND sources, include the discriminator in any path concatenations and in forwardTerminalSatisfaction/terminal projection logic so recursive rows are only matched to their originating outer UNWIND row, and apply the same change at the other affected blocks referenced (around the expansions at the other ranges).
🧹 Nitpick comments (1)
integration/harness.go (1)
91-94: SetCONNECTION_STRINGbefore running integration suitePlease add
CONNECTION_STRINGin your environment before runningmake test_all, and run only the backend selected by that scheme (other backend tests should skip).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration/harness.go` around lines 91 - 94, The test harness reads CONNECTION_STRING into connStr in integration/harness.go and calls t.Skip() when it's empty; before running make test_all set the CONNECTION_STRING environment variable to the backend you want to test (e.g., export CONNECTION_STRING="postgres://...") so connStr is populated and the selected backend tests run; ensure you only run the backend matching that scheme so other backend test suites remain skipped by the existing connStr check and t.Skip() guard.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cypher/models/pgsql/optimize/locality.go`:
- Around line 63-66: QueryReferencesOnlyLocalIdentifiers currently treats any
non-nil query.CommonTableExpressions as “has CTEs”, but NewQueryPart initializes
queries with an empty &pgsql.With{}, causing false positives; update
QueryReferencesOnlyLocalIdentifiers to detect real CTEs by checking whether
CommonTableExpressions actually contains entries (e.g. test
len(query.CommonTableExpressions.Ctes) > 0 or call an existing IsEmpty/HasCTEs
helper) instead of only checking for nil, so empty pgsql.With{} is treated as no
CTEs and pushdown logic remains enabled.
- Around line 8-14: FlattenConjunction only handles pointer-form nodes
(*pgsql.BinaryExpression), so value-form BinaryExpression nodes are left
unflattened; update FlattenConjunction to recognize both pointer and value forms
by using a type switch (or separate type assertions) to handle
pgsql.BinaryExpression as well as *pgsql.BinaryExpression, check Operator ==
pgsql.OperatorAnd for either case, and recurse on the appropriate
LOperand/ROperand fields (referencing FlattenConjunction,
pgsql.BinaryExpression, OperatorAnd, LOperand, ROperand) so both value and
pointer trees are flattened.
In `@cypher/models/pgsql/optimize/selectivity.go`:
- Around line 170-205: The code currently grants
selectivityWeightEntityIDReference as soon as one side is an ID ref (in
BinaryExpression handling), which wrongly treats non-point operators (e.g., >,
<, !=) as point-lookups; change it so the ID point-lookup bonus is only applied
for true point operators: compute lOperandIsID/rOperandIsID using
isColumnIDRef(typedNode.LOperand/ROperand) as before, but remove the
unconditional s.addSelectivity(selectivityWeightEntityIDReference) calls and
instead, inside the operator switch for typedNode.Operator, only call
s.addSelectivity(selectivityWeightEntityIDReference) in the cases that represent
point predicates (e.g., pgsql.OperatorEquals, pgsql.OperatorIn, and
pgsql.OperatorIs) and only when exactly one side is an ID ref (lOperandIsID !=
rOperandIsID); leave joins (both ID refs) and other operator cases unchanged.
In `@cypher/models/pgsql/translate/expansion.go`:
- Around line 1192-1195: The current logic calls appendUnwindSourcesIfReferenced
using only expansionModel.EdgeConstraints, which misses UNWIND bindings
referenced exclusively by
forwardTerminalSatisfaction/backwardTerminalSatisfaction (they add
TerminalNodeSatisfactionProjection/PrimerNodeSatisfactionProjection to the
SELECT). Update the decision to include UNWIND sources by inspecting the full
select body: pass a combined set of referenced projections (edge constraints
plus any projections produced by forwardTerminalSatisfaction and
backwardTerminalSatisfaction) into appendUnwindSourcesIfReferenced (or call it
twice with both sets) so that nextQuery (and other similar call sites) include
UNWIND FROM clauses whenever those endpoint satisfaction projections reference
UNWIND bindings; apply the same change at the other occurrences noted.
In `@cypher/models/pgsql/translate/optimizer_safety_test.go`:
- Around line 421-423: Test currently assumes the primer SQL is stored under the
key "pi0" (using translation.Parameters["pi0"] and hasPrimerQuery); instead,
search translation.Parameters for the primer SQL by value/content (iterate
translation.Parameters map, type-assert values to string and compare to the
expected primer SQL text), assert that a matching entry exists, capture that
value into primerQuery for subsequent assertions, and remove the hard-coded
"pi0" key usage so the test no longer depends on internal parameter ordering.
---
Outside diff comments:
In `@cypher/models/pgsql/translate/expansion.go`:
- Around line 1212-1235: The recursive frontier projection
(nextQuery.Projection) is missing an outer-row discriminator so UNWIND/outer
rows will be cross-joined into recursive fronts; fix by threading a unique
outer-row key through the front and harness state: add a dedicated discriminator
column (e.g., outerRowID or unwindRowKey) to the projection alongside
expansionModel.Frame.Binding.Identifier, expansionRootID, expansionDepth and
expansionPath, ensure appendUnwindSourcesIfReferenced populates that
discriminator when prepending previous frame/UNWIND sources, include the
discriminator in any path concatenations and in
forwardTerminalSatisfaction/terminal projection logic so recursive rows are only
matched to their originating outer UNWIND row, and apply the same change at the
other affected blocks referenced (around the expansions at the other ranges).
---
Nitpick comments:
In `@integration/harness.go`:
- Around line 91-94: The test harness reads CONNECTION_STRING into connStr in
integration/harness.go and calls t.Skip() when it's empty; before running make
test_all set the CONNECTION_STRING environment variable to the backend you want
to test (e.g., export CONNECTION_STRING="postgres://...") so connStr is
populated and the selected backend tests run; ensure you only run the backend
matching that scheme so other backend test suites remain skipped by the existing
connStr check and t.Skip() guard.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d2be8088-4497-4814-b198-a244cc83c553
📒 Files selected for processing (24)
cypher/models/pgsql/optimize/locality.gocypher/models/pgsql/optimize/lowering_plan.gocypher/models/pgsql/optimize/optimizer_test.gocypher/models/pgsql/optimize/selectivity.gocypher/models/pgsql/optimize/source_references.gocypher/models/pgsql/test/translation_cases/multipart.sqlcypher/models/pgsql/test/translation_cases/pattern_binding.sqlcypher/models/pgsql/test/translation_cases/pattern_expansion.sqlcypher/models/pgsql/translate/constraints.gocypher/models/pgsql/translate/constraints_test.gocypher/models/pgsql/translate/expansion.gocypher/models/pgsql/translate/function.gocypher/models/pgsql/translate/function_test.gocypher/models/pgsql/translate/model.gocypher/models/pgsql/translate/optimizer_safety_test.gocypher/models/pgsql/translate/path_functions.gocypher/models/pgsql/translate/pattern.gocypher/models/pgsql/translate/predicate.gocypher/models/pgsql/translate/predicate_test.gocypher/models/pgsql/translate/projection.gocypher/models/pgsql/translate/selectivity.gocypher/models/pgsql/translate/tracking.gocypher/models/pgsql/translate/tracking_test.gointegration/harness.go
| // FlattenConjunction collects the leaf operands of a left-recursive AND chain. | ||
| func FlattenConjunction(expr pgsql.Expression) []pgsql.Expression { | ||
| if bin, typeOK := expr.(*pgsql.BinaryExpression); !typeOK || bin.Operator != pgsql.OperatorAnd { | ||
| return []pgsql.Expression{expr} | ||
| } else { | ||
| return append(FlattenConjunction(bin.LOperand), FlattenConjunction(bin.ROperand)...) | ||
| } |
There was a problem hiding this comment.
Handle value-form pgsql.BinaryExpression here too.
The translator supports both pointer and value binary nodes, but this helper only unwraps *pgsql.BinaryExpression. Value-form A AND B trees stay unsplit, so mixed local/external predicates stop being partitioned and pushed down.
Suggested fix
func FlattenConjunction(expr pgsql.Expression) []pgsql.Expression {
- if bin, typeOK := expr.(*pgsql.BinaryExpression); !typeOK || bin.Operator != pgsql.OperatorAnd {
- return []pgsql.Expression{expr}
- } else {
- return append(FlattenConjunction(bin.LOperand), FlattenConjunction(bin.ROperand)...)
- }
+ switch bin := expr.(type) {
+ case *pgsql.BinaryExpression:
+ if bin == nil || bin.Operator != pgsql.OperatorAnd {
+ return []pgsql.Expression{expr}
+ }
+ return append(FlattenConjunction(bin.LOperand), FlattenConjunction(bin.ROperand)...)
+
+ case pgsql.BinaryExpression:
+ if bin.Operator != pgsql.OperatorAnd {
+ return []pgsql.Expression{expr}
+ }
+ return append(FlattenConjunction(bin.LOperand), FlattenConjunction(bin.ROperand)...)
+
+ default:
+ return []pgsql.Expression{expr}
+ }
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cypher/models/pgsql/optimize/locality.go` around lines 8 - 14,
FlattenConjunction only handles pointer-form nodes (*pgsql.BinaryExpression), so
value-form BinaryExpression nodes are left unflattened; update
FlattenConjunction to recognize both pointer and value forms by using a type
switch (or separate type assertions) to handle pgsql.BinaryExpression as well as
*pgsql.BinaryExpression, check Operator == pgsql.OperatorAnd for either case,
and recurse on the appropriate LOperand/ROperand fields (referencing
FlattenConjunction, pgsql.BinaryExpression, OperatorAnd, LOperand, ROperand) so
both value and pointer trees are flattened.
| func QueryReferencesOnlyLocalIdentifiers(query pgsql.Query, localScope *pgsql.IdentifierSet) bool { | ||
| if query.CommonTableExpressions != nil { | ||
| return false | ||
| } |
There was a problem hiding this comment.
Don't treat an empty WITH placeholder as “has CTEs”.
QueryReferencesOnlyLocalIdentifiers rejects every query with a non-nil CommonTableExpressions, but NewQueryPart initializes queries with &pgsql.With{} by default. That makes locality checks fail even when the query has no CTEs and shuts off the pushdown logic that depends on this helper.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cypher/models/pgsql/optimize/locality.go` around lines 63 - 66,
QueryReferencesOnlyLocalIdentifiers currently treats any non-nil
query.CommonTableExpressions as “has CTEs”, but NewQueryPart initializes queries
with an empty &pgsql.With{}, causing false positives; update
QueryReferencesOnlyLocalIdentifiers to detect real CTEs by checking whether
CommonTableExpressions actually contains entries (e.g. test
len(query.CommonTableExpressions.Ctes) > 0 or call an existing IsEmpty/HasCTEs
helper) instead of only checking for nil, so empty pgsql.With{} is treated as no
CTEs and pushdown logic remains enabled.
| case *pgsql.BinaryExpression: | ||
| var ( | ||
| lOperandIsID = isColumnIDRef(typedNode.LOperand) | ||
| rOperandIsID = isColumnIDRef(typedNode.ROperand) | ||
| ) | ||
|
|
||
| if lOperandIsID && !rOperandIsID { | ||
| // Point lookup: n0.id = <literal or param>; highly selective. | ||
| s.addSelectivity(selectivityWeightEntityIDReference) | ||
| } else if rOperandIsID && !lOperandIsID { | ||
| // Canonically unusual, but handle it the same. | ||
| s.addSelectivity(selectivityWeightEntityIDReference) | ||
| } | ||
|
|
||
| // If both sides are ID refs, this is a join condition; do not score as a point lookup. | ||
| switch typedNode.Operator { | ||
| case pgsql.OperatorOr: | ||
| s.addSelectivity(selectivityWeightDisjunction) | ||
|
|
||
| case pgsql.OperatorNotEquals: | ||
| s.addSelectivity(selectivityWeightNotEquals) | ||
|
|
||
| case pgsql.OperatorAnd: | ||
| s.addSelectivity(selectivityWeightConjunction) | ||
|
|
||
| case pgsql.OperatorLessThan, pgsql.OperatorGreaterThan, pgsql.OperatorLessThanOrEqualTo, pgsql.OperatorGreaterThanOrEqualTo: | ||
| s.addSelectivity(selectivityWeightRangeComparison) | ||
|
|
||
| case pgsql.OperatorLike, pgsql.OperatorILike, pgsql.OperatorRegexMatch, pgsql.OperatorSimilarTo: | ||
| s.addSelectivity(selectivityWeightStringSearch) | ||
|
|
||
| case pgsql.OperatorIn, pgsql.OperatorEquals, pgsql.OperatorIs: | ||
| s.addSelectivity(selectivityWeightNarrowSearch) | ||
|
|
||
| case pgsql.OperatorPGArrayOverlap, pgsql.OperatorArrayOverlap: | ||
| s.addSelectivity(selectivityWeightNarrowSearch) |
There was a problem hiding this comment.
Only award the ID point-lookup bonus on point predicates.
This adds selectivityWeightEntityIDReference before checking the operator, so broad predicates like n.id <> 1 or n.id > 1 score like unique lookups. That can skew traversal flipping and bidirectional anchor selection.
Suggested fix
case *pgsql.BinaryExpression:
var (
lOperandIsID = isColumnIDRef(typedNode.LOperand)
rOperandIsID = isColumnIDRef(typedNode.ROperand)
)
-
- if lOperandIsID && !rOperandIsID {
- // Point lookup: n0.id = <literal or param>; highly selective.
- s.addSelectivity(selectivityWeightEntityIDReference)
- } else if rOperandIsID && !lOperandIsID {
- // Canonically unusual, but handle it the same.
- s.addSelectivity(selectivityWeightEntityIDReference)
- }
// If both sides are ID refs, this is a join condition; do not score as a point lookup.
switch typedNode.Operator {
case pgsql.OperatorOr:
s.addSelectivity(selectivityWeightDisjunction)
@@
- case pgsql.OperatorIn, pgsql.OperatorEquals, pgsql.OperatorIs:
+ case pgsql.OperatorIn, pgsql.OperatorEquals:
s.addSelectivity(selectivityWeightNarrowSearch)
+ if (lOperandIsID && !rOperandIsID) || (rOperandIsID && !lOperandIsID) {
+ s.addSelectivity(selectivityWeightEntityIDReference)
+ }
+
+ case pgsql.OperatorIs:
+ s.addSelectivity(selectivityWeightNarrowSearch)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| case *pgsql.BinaryExpression: | |
| var ( | |
| lOperandIsID = isColumnIDRef(typedNode.LOperand) | |
| rOperandIsID = isColumnIDRef(typedNode.ROperand) | |
| ) | |
| if lOperandIsID && !rOperandIsID { | |
| // Point lookup: n0.id = <literal or param>; highly selective. | |
| s.addSelectivity(selectivityWeightEntityIDReference) | |
| } else if rOperandIsID && !lOperandIsID { | |
| // Canonically unusual, but handle it the same. | |
| s.addSelectivity(selectivityWeightEntityIDReference) | |
| } | |
| // If both sides are ID refs, this is a join condition; do not score as a point lookup. | |
| switch typedNode.Operator { | |
| case pgsql.OperatorOr: | |
| s.addSelectivity(selectivityWeightDisjunction) | |
| case pgsql.OperatorNotEquals: | |
| s.addSelectivity(selectivityWeightNotEquals) | |
| case pgsql.OperatorAnd: | |
| s.addSelectivity(selectivityWeightConjunction) | |
| case pgsql.OperatorLessThan, pgsql.OperatorGreaterThan, pgsql.OperatorLessThanOrEqualTo, pgsql.OperatorGreaterThanOrEqualTo: | |
| s.addSelectivity(selectivityWeightRangeComparison) | |
| case pgsql.OperatorLike, pgsql.OperatorILike, pgsql.OperatorRegexMatch, pgsql.OperatorSimilarTo: | |
| s.addSelectivity(selectivityWeightStringSearch) | |
| case pgsql.OperatorIn, pgsql.OperatorEquals, pgsql.OperatorIs: | |
| s.addSelectivity(selectivityWeightNarrowSearch) | |
| case pgsql.OperatorPGArrayOverlap, pgsql.OperatorArrayOverlap: | |
| s.addSelectivity(selectivityWeightNarrowSearch) | |
| case *pgsql.BinaryExpression: | |
| var ( | |
| lOperandIsID = isColumnIDRef(typedNode.LOperand) | |
| rOperandIsID = isColumnIDRef(typedNode.ROperand) | |
| ) | |
| // If both sides are ID refs, this is a join condition; do not score as a point lookup. | |
| switch typedNode.Operator { | |
| case pgsql.OperatorOr: | |
| s.addSelectivity(selectivityWeightDisjunction) | |
| case pgsql.OperatorNotEquals: | |
| s.addSelectivity(selectivityWeightNotEquals) | |
| case pgsql.OperatorAnd: | |
| s.addSelectivity(selectivityWeightConjunction) | |
| case pgsql.OperatorLessThan, pgsql.OperatorGreaterThan, pgsql.OperatorLessThanOrEqualTo, pgsql.OperatorGreaterThanOrEqualTo: | |
| s.addSelectivity(selectivityWeightRangeComparison) | |
| case pgsql.OperatorLike, pgsql.OperatorILike, pgsql.OperatorRegexMatch, pgsql.OperatorSimilarTo: | |
| s.addSelectivity(selectivityWeightStringSearch) | |
| case pgsql.OperatorIn, pgsql.OperatorEquals: | |
| s.addSelectivity(selectivityWeightNarrowSearch) | |
| if (lOperandIsID && !rOperandIsID) || (rOperandIsID && !lOperandIsID) { | |
| s.addSelectivity(selectivityWeightEntityIDReference) | |
| } | |
| case pgsql.OperatorIs: | |
| s.addSelectivity(selectivityWeightNarrowSearch) | |
| case pgsql.OperatorPGArrayOverlap, pgsql.OperatorArrayOverlap: | |
| s.addSelectivity(selectivityWeightNarrowSearch) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cypher/models/pgsql/optimize/selectivity.go` around lines 170 - 205, The code
currently grants selectivityWeightEntityIDReference as soon as one side is an ID
ref (in BinaryExpression handling), which wrongly treats non-point operators
(e.g., >, <, !=) as point-lookups; change it so the ID point-lookup bonus is
only applied for true point operators: compute lOperandIsID/rOperandIsID using
isColumnIDRef(typedNode.LOperand/ROperand) as before, but remove the
unconditional s.addSelectivity(selectivityWeightEntityIDReference) calls and
instead, inside the operator switch for typedNode.Operator, only call
s.addSelectivity(selectivityWeightEntityIDReference) in the cases that represent
point predicates (e.g., pgsql.OperatorEquals, pgsql.OperatorIn, and
pgsql.OperatorIs) and only when exactly one side is an ID ref (lOperandIsID !=
rOperandIsID); leave joins (both ID refs) and other operator cases unchanged.
| nextQuery.From = []pgsql.FromClause{nextQueryFrom} | ||
| if err := s.appendUnwindSourcesIfReferenced(&nextQuery, expansionModel.EdgeConstraints); err != nil { | ||
| return pgsql.Query{}, nil, err | ||
| } |
There was a problem hiding this comment.
Include UNWIND bindings when satisfaction expressions depend on them.
These attachment checks only look at expansionModel.EdgeConstraints, but both forwardTerminalSatisfaction and backwardTerminalSatisfaction can also inject TerminalNodeSatisfactionProjection / PrimerNodeSatisfactionProjection into the same SELECT. If an UNWIND binding is referenced only from those endpoint predicates, the generated harness fragment still emits that identifier without a matching FROM source. Please decide UNWIND inclusion from the full select body, or at least from both the edge constraints and the satisfaction expression.
Also applies to: 1286-1289, 1375-1378, 1447-1450
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cypher/models/pgsql/translate/expansion.go` around lines 1192 - 1195, The
current logic calls appendUnwindSourcesIfReferenced using only
expansionModel.EdgeConstraints, which misses UNWIND bindings referenced
exclusively by forwardTerminalSatisfaction/backwardTerminalSatisfaction (they
add TerminalNodeSatisfactionProjection/PrimerNodeSatisfactionProjection to the
SELECT). Update the decision to include UNWIND sources by inspecting the full
select body: pass a combined set of referenced projections (edge constraints
plus any projections produced by forwardTerminalSatisfaction and
backwardTerminalSatisfaction) into appendUnwindSourcesIfReferenced (or call it
twice with both sets) so that nextQuery (and other similar call sites) include
UNWIND FROM clauses whenever those endpoint satisfaction projections reference
UNWIND bindings; apply the same change at the other occurrences noted.
| primerQuery, hasPrimerQuery := translation.Parameters["pi0"].(string) | ||
|
|
||
| require.True(t, hasPrimerQuery) |
There was a problem hiding this comment.
Avoid pinning this assertion to pi0.
The parameter name is an internal ordering detail, so this test will fail if another parameter is introduced ahead of the primer query. Look up the primer SQL by value/content instead of hard-coding the key.
Suggested change
- primerQuery, hasPrimerQuery := translation.Parameters["pi0"].(string)
-
- require.True(t, hasPrimerQuery)
+ var primerQuery string
+ for _, value := range translation.Parameters {
+ candidate, ok := value.(string)
+ if ok && strings.Contains(candidate, "unnest(array ['source']::text[]) as i0") {
+ primerQuery = candidate
+ break
+ }
+ }
+
+ require.NotEmpty(t, primerQuery)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| primerQuery, hasPrimerQuery := translation.Parameters["pi0"].(string) | |
| require.True(t, hasPrimerQuery) | |
| var primerQuery string | |
| for _, value := range translation.Parameters { | |
| candidate, ok := value.(string) | |
| if ok && strings.Contains(candidate, "unnest(array ['source']::text[]) as i0") { | |
| primerQuery = candidate | |
| break | |
| } | |
| } | |
| require.NotEmpty(t, primerQuery) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cypher/models/pgsql/translate/optimizer_safety_test.go` around lines 421 -
423, Test currently assumes the primer SQL is stored under the key "pi0" (using
translation.Parameters["pi0"] and hasPrimerQuery); instead, search
translation.Parameters for the primer SQL by value/content (iterate
translation.Parameters map, type-assert values to string and compare to the
expected primer SQL text), assert that a matching entry exists, capture that
value into primerQuery for subsequent assertions, and remove the hard-coded
"pi0" key usage so the test no longer depends on internal parameter ordering.
Description
Don't open. Dead inside.
Resolves: <TICKET_OR_ISSUE_NUMBER>
Type of Change
Testing
make test_allwithCONNECTION_STRINGset)Screenshots (if appropriate):
Driver Impact
drivers/pg)drivers/neo4j)Checklist
go.mod/go.sumare up to date if dependencies changedSummary by CodeRabbit
New Features
-explainflag to benchmark tool to capture PostgreSQL execution plan diagnostics.Improvements