Skip to content

CySQL Query Optimizer Framework#88

Open
zinic wants to merge 47 commits into
SpecterOps:mainfrom
zinic:optimizer
Open

CySQL Query Optimizer Framework#88
zinic wants to merge 47 commits into
SpecterOps:mainfrom
zinic:optimizer

Conversation

@zinic
Copy link
Copy Markdown
Contributor

@zinic zinic commented May 22, 2026

Description

Don't open. Dead inside.

Resolves: <TICKET_OR_ISSUE_NUMBER>

Type of Change

  • Chore (a change that does not modify the application functionality)
  • Bug fix (a change that fixes an issue)
  • New feature / enhancement (a change that adds new functionality)
  • Refactor (no behaviour change)
  • Test coverage
  • Build / CI / tooling
  • Documentation

Testing

  • Unit tests added / updated
  • Integration tests added / updated
  • Full test suite run (make test_all with CONNECTION_STRING set)

Screenshots (if appropriate):

Driver Impact

  • PostgreSQL driver (drivers/pg)
  • Neo4j driver (drivers/neo4j)

Checklist

  • Code is formatted
  • All existing tests pass
  • go.mod / go.sum are up to date if dependencies changed

Summary by CodeRabbit

  • New Features

    • Added -explain flag to benchmark tool to capture PostgreSQL execution plan diagnostics.
    • Enhanced benchmark output with additional metrics: distinct row counts, duplicate row counts, and explain status.
  • Improvements

    • Optimized query execution with improved path materialization, predicate placement, and relationship expansion strategies.
    • Strengthened query analysis to better identify optimization opportunities across complex patterns.

Review Change Stack

zinic added 30 commits May 21, 2026 22:55
Late-materialize variable-length path edge arrays from compact expansion path ID arrays.

Avoid carrying expansion edge composites through intermediate projections unless the relationship binding itself remains observable.
Represent fixed relationships used only by path bindings as scalar edge IDs through intermediate projections.

Reconstruct edge composites from those IDs only when path values or path relationship arrays are needed, while keeping directly referenced relationship variables materialized as full composites.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 22, 2026

Walkthrough

Adds a PG Cypher optimizer (analysis, rules, lowering) and integrates its decisions into translation. Introduces new AST nodes/utilities, regenerates SQL fixtures, expands integration tests, and enhances the benchmark tool to capture PostgreSQL EXPLAIN with optimizer metadata.

Changes

Benchmark EXPLAIN capture and reporting

Layer / File(s) Summary
Benchmark CLI, runner, reporting, and README
cmd/benchmark/*
Adds -explain flag, ExplainResult/newPostgresExplainer, iterations validation, warm-up row measurements, richer per-scenario output, Markdown/JSON updates, and tests.

Cypher AST deep-copy and helpers

Layer / File(s) Summary
Deep-copy implementations and tests
cypher/models/cypher/*
Adds copy() for RangeQuantifier/Properties/literals; deep-copies Variables; nil-safe helpers; tests for deep copy and nil handling.

PGSQL AST additions, formatting, walking, and renaming

Layer / File(s) Summary
New node, formatter, walker, renamer, and typing
cypher/models/pgsql/*
Adds EdgeArrayFromPathIDs and formatting; walker/renamer support; introduces PathEdge type; identifier/type utilities and tests.

Optimizer core: analysis, locality, lowering plan, rules, and tests

Layer / File(s) Summary
Analysis model, locality utilities, lowering plan
cypher/models/pgsql/optimize/*
Implements analysis, locality checks, and lowering plan decisions (projection pruning, expand-into, direction, shortest-path, pushdowns, predicate placement).
Rule engine, predicate attachment, reordering, selectivity, source refs, and tests
cypher/models/pgsql/optimize/*
Adds rule framework, predicate attachment, conservative reordering, selectivity model, source-identifier references, with extensive tests.

Translator integration with optimizer decisions and new flows

Layer / File(s) Summary
Translator wiring, decisions, traversal/expansion, projection, predicates, functions, with/model/match
cypher/models/pgsql/translate/*
Wires optimizer plan; implements expand-into, suffix pushdown, direction and shortest-path strategies, limit pushdown; null-safe path composites; predicate-existence placement; path component/with resolution; tests.

SQL translation test fixtures regenerated

Layer / File(s) Summary
Updated SQL fixtures for create/delete/multipart/.../update
cypher/models/pgsql/test/translation_cases/*
Regenerates SQL to use ordered_edges_to_path with CASE null-guards, id-inequality constraints, and updated recursion columns across many cases.

Integration tests, data fixtures, and metrics allowlist

Layer / File(s) Summary
Integration assertions, cases, data, and harness
integration/*, tools/metrics/*
Adds ADCS fanout data; new/updated inline cases; new relationship_list_kinds assertion; harness skips without DB; metrics allowlist update.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant Optimizer
  participant Translator
  participant PostgreSQL
  Client->>Optimizer: Optimize(cypher)
  Optimizer-->>Translator: Plan (rules, attachments, lowerings)
  Translator->>PostgreSQL: Execute generated SQL
  PostgreSQL-->>Translator: Rows/Plan
  Translator-->>Client: Results + Optimization summary
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~90+ minutes

Possibly related PRs

  • SpecterOps/DAWGS#73: Touches predicate-existence optimization paths overlapping updated predicate.go/traversal changes.
  • SpecterOps/DAWGS#85: Modifies OptimizePatternConstraintBalance, overlapping this PR’s refactor to delegate flip decisions.

Suggested labels

enhancement, go

Suggested reviewers

  • urangel
  • kpom-specter

Poem

In tunnels of edges, I hop and I plan,
Weighing each predicate, as only rabbits can.
I flip and I prune, with charts in my paw,
EXPLAINs in my backpack, optimizing by law.
Paths line up neatly—tick-tock, no dread—
Then I press run: carrots up, green ahead! 🥕

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (2)
cypher/models/pgsql/translate/tracking.go (1)

347-353: ⚡ Quick win

Make LookupDataType alias-aware.

This helper goes straight to Lookup, so LookupDataType(alias) returns false for identifiers that AliasedLookup/LookupString already resolve. That makes the new exported API easy to misuse anywhere aliased bindings are visible.

Possible fix
 func (s *Scope) LookupDataType(identifier pgsql.Identifier) (pgsql.DataType, bool) {
-	if binding, bound := s.Lookup(identifier); !bound {
-		return "", false
-	} else {
+	if binding, bound := s.Lookup(identifier); bound {
+		return binding.DataType, true
+	}
+
+	if binding, bound := s.AliasedLookup(identifier); bound {
 		return binding.DataType, true
 	}
+
+	return "", false
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cypher/models/pgsql/translate/tracking.go` around lines 347 - 353,
LookupDataType currently calls s.Lookup directly and thus fails for aliased
identifiers; change it to resolve aliases first (use s.AliasedLookup(identifier)
or s.LookupString on the identifier string equivalent) and return the resolved
binding's DataType and true when found, otherwise return "" and false—i.e.,
replace the s.Lookup call in LookupDataType with an alias-aware lookup using
AliasedLookup/LookupString and return binding.DataType, true when present.
integration/cypher_test.go (1)

235-237: Verify integration execution is scoped by CONNECTION_STRING.

Before running make test_all for these integration additions, ensure CONNECTION_STRING is set and that only the backend selected by its scheme is exercised while other backend tests self-skip.
As per coding guidelines, "Integration suites must consume CONNECTION_STRING environment variable. If CONNECTION_STRING is not present in the LLM context, ask the user to add it before running make test_all" and "Run make test_all only for the backend selected by the scheme in CONNECTION_STRING."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@integration/cypher_test.go` around lines 235 - 237, The integration tests
must be gated by the CONNECTION_STRING environment setting: add a setup check
(preferably in the test package's TestMain or the top-level test runner that
invokes the switch handling cases like the one that calls
assertRelationshipListKinds and decodeAssertionValue) that reads
CONNECTION_STRING, if it's missing call t.Skipf or return an error instructing
the user to set CONNECTION_STRING before running make test_all, and if present
parse the URI scheme (text before "://") and use that value to only run tests
for the selected backend (skip other backend-specific tests with t.Skipf when
the scheme does not match); ensure this gating is applied globally so each case
(e.g., relationship_list_kinds) self-skips when the scheme is not applicable.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cypher/models/pgsql/optimize/locality.go`:
- Around line 111-145: The function SelectReferencesOnlyLocalIdentifiers
currently seeds all from/join aliases up front via AddFromClauseBindings, which
lets an ON constraint see later join aliases as local; change the algorithm to
validate JOIN constraints against an incremental scope instead: copy localScope
into scopedIdentifiers, call AddFromClauseBindings only for the from clause
source (or add the source alias binding) then for each join in fromClause.Joins
validate join.Table with FromExpressionReferencesOnlyLocalIdentifiers and
validate join.JoinOperator.Constraint with
ExpressionReferencesOnlyLocalIdentifiers using the current scopedIdentifiers,
and only after those checks add the join's alias/bindings into scopedIdentifiers
(mirroring how AddFromClauseBindings would add them) so later joins see earlier
aliases but not vice versa.

In `@cypher/models/pgsql/optimize/lowering_plan.go`:
- Around line 611-640: The loop that appends LimitPushdownDecision entries is
missing a guard to skip OPTIONAL MATCH clauses; update the loop over
readingClauses in lowering_plan.go to check the clause's optionality (e.g.
readingClause.Optional or readingClause.OptionalMatch) right after the existing
nil/Match nil checks and continue when true so no LimitPushdown entries are
added for optional matches; keep the existing logic that computes patternTarget,
traversalStepsForPattern,
LimitPushdownTraversalCTE/LimitPushdownShortestPathHarness, and appends to
plan.LimitPushdown unchanged for non-optional clauses.

In `@cypher/models/pgsql/optimize/selectivity.go`:
- Around line 112-117: popSelectivity currently calls Selectivity(), which
returns the root frame (s.selectivityStack[0]), causing pops to return the wrong
value; update popSelectivity in measureSelectivityVisitor to read the top frame
directly (value := s.selectivityStack[len(s.selectivityStack)-1]) before slicing
the stack down (s.selectivityStack = s.selectivityStack[:len-1]) so it returns
the top-most frame for nested scopes and NOT(...) computations instead of the
root.

In `@cypher/models/pgsql/optimize/source_references.go`:
- Around line 85-87: The branch handling *cypher.Variable in
source_references.go incorrectly treats declaration-context variables as
references; update the case in the switch so that it calls
s.addVariable(typedNode) only when matchPatternDeclarationDepth == 0 (i.e., the
variable is not inside a match pattern declaration). In other words, guard the
addVariable call with a check of s.matchPatternDeclarationDepth (or the named
variable/method used for that depth) and skip marking variables as referenced
when inside a declaration context.

In `@cypher/models/pgsql/test/translation_cases/pattern_binding.sql`:
- Around line 89-90: The query recomputes ordered_edges_to_path(...) multiple
times when evaluating NONE over TAIL(TAIL(NODES(p))), causing repeated edge
scans; refactor by materializing the path once in an inner CTE (stage the
matched path as p or a CTE column e.g. p_path / ep0 inside s0/s1) using
ordered_edges_to_path(...) and expose its (p).nodes as a column, then in the
outer query reference that materialized p.nodes for the NONE(...) /
TAIL(TAIL(...)) predicate instead of re-invoking ordered_edges_to_path(...)
repeatedly.

In `@cypher/models/pgsql/translate/expansion.go`:
- Around line 2304-2306: The UNWIND sources collected via
unwindClauses/unwindFromClauses are only attached in buildExpansionPatternRoot
and are not propagated for shortest-path roots built by
buildShortestPathsExpansionPattern/Build*ShortestPathsRoot, causing UNWIND ...
MATCH p = shortestPath(...) to lose row sources/bindings; refactor so the
unwindSources plumbing is applied in the shared root-builder path used by both
buildExpansionPatternRoot and
buildShortestPathsExpansionPattern/Build*ShortestPathsRoot (move the attach
logic into the common root construction function), ensure seed
predicates/binding include the unwindSources there, and add a regression test
exercising UNWIND ... MATCH p = shortestPath(...) to validate the fix.

In `@cypher/models/pgsql/translate/model.go`:
- Around line 170-181: The guard in canMaterializeEndpointPairFilterForStep
currently uses hasLocalEndpointConstraint which allows label-only predicates;
change both endpoint checks to require pair-aware predicates by calling
hasPairAwareEndpointConstraint (or equivalent search-column check) for
expansionModel.PrimerNodeConstraints against traversalStep.LeftNode.Identifier
and expansionModel.TerminalNodeConstraints against
traversalStep.RightNode.Identifier so materializedEndpointPairFilterStatement
only runs when both endpoints supply pair-aware constraints and avoid
enumerating explosive cross-products.

In `@cypher/models/pgsql/translate/predicate.go`:
- Around line 136-149: The current code returns from buildPatternPredicates
after handling one optimized existence-placement predicate (inside the block
that calls s.usePatternPredicateExistencePlacement and
s.buildOptimizedRelationshipExistPredicate), which prevents later predicates in
the same patternPart from being processed; instead, after assigning
predicateFuture.SyntaxNode and calling
s.recordLowering(optimize.LoweringPredicatePlacement) do not return nil—skip to
the next predicate in the loop (e.g., replace the early return with a continue
or otherwise let the loop continue) so remaining predicates are built and
translated; keep the existing calls to s.usePatternPredicateExistencePlacement,
s.buildOptimizedRelationshipExistPredicate, predicateFuture.SyntaxNode
assignment, and s.recordLowering as-is.

---

Nitpick comments:
In `@cypher/models/pgsql/translate/tracking.go`:
- Around line 347-353: LookupDataType currently calls s.Lookup directly and thus
fails for aliased identifiers; change it to resolve aliases first (use
s.AliasedLookup(identifier) or s.LookupString on the identifier string
equivalent) and return the resolved binding's DataType and true when found,
otherwise return "" and false—i.e., replace the s.Lookup call in LookupDataType
with an alias-aware lookup using AliasedLookup/LookupString and return
binding.DataType, true when present.

In `@integration/cypher_test.go`:
- Around line 235-237: The integration tests must be gated by the
CONNECTION_STRING environment setting: add a setup check (preferably in the test
package's TestMain or the top-level test runner that invokes the switch handling
cases like the one that calls assertRelationshipListKinds and
decodeAssertionValue) that reads CONNECTION_STRING, if it's missing call t.Skipf
or return an error instructing the user to set CONNECTION_STRING before running
make test_all, and if present parse the URI scheme (text before "://") and use
that value to only run tests for the selected backend (skip other
backend-specific tests with t.Skipf when the scheme does not match); ensure this
gating is applied globally so each case (e.g., relationship_list_kinds)
self-skips when the scheme is not applicable.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 04b35b30-d9d0-4fc9-9ae1-a1cd8ffecbf5

📥 Commits

Reviewing files that changed from the base of the PR and between 9131d97 and b7e84bc.

📒 Files selected for processing (66)
  • cmd/benchmark/README.md
  • cmd/benchmark/explain.go
  • cmd/benchmark/main.go
  • cmd/benchmark/report.go
  • cmd/benchmark/report_test.go
  • cmd/benchmark/runner.go
  • cmd/benchmark/scenarios.go
  • cypher/models/cypher/copy.go
  • cypher/models/cypher/copy_test.go
  • cypher/models/cypher/model.go
  • cypher/models/pgsql/format/format.go
  • cypher/models/pgsql/model.go
  • cypher/models/pgsql/optimize/analysis.go
  • cypher/models/pgsql/optimize/analysis_test.go
  • cypher/models/pgsql/optimize/locality.go
  • cypher/models/pgsql/optimize/lowering.go
  • cypher/models/pgsql/optimize/lowering_plan.go
  • cypher/models/pgsql/optimize/optimizer.go
  • cypher/models/pgsql/optimize/optimizer_test.go
  • cypher/models/pgsql/optimize/pattern_predicates.go
  • cypher/models/pgsql/optimize/reordering.go
  • cypher/models/pgsql/optimize/selectivity.go
  • cypher/models/pgsql/optimize/source_references.go
  • cypher/models/pgsql/pgtypes.go
  • cypher/models/pgsql/test/translation_cases/create.sql
  • cypher/models/pgsql/test/translation_cases/delete.sql
  • cypher/models/pgsql/test/translation_cases/multipart.sql
  • cypher/models/pgsql/test/translation_cases/nodes.sql
  • cypher/models/pgsql/test/translation_cases/pattern_binding.sql
  • cypher/models/pgsql/test/translation_cases/pattern_expansion.sql
  • cypher/models/pgsql/test/translation_cases/pattern_rewriting.sql
  • cypher/models/pgsql/test/translation_cases/quantifiers.sql
  • cypher/models/pgsql/test/translation_cases/scalar_aggregation.sql
  • cypher/models/pgsql/test/translation_cases/shortest_paths.sql
  • cypher/models/pgsql/test/translation_cases/stepwise_traversal.sql
  • cypher/models/pgsql/test/translation_cases/update.sql
  • cypher/models/pgsql/translate/constraints.go
  • cypher/models/pgsql/translate/expansion.go
  • cypher/models/pgsql/translate/function.go
  • cypher/models/pgsql/translate/limit_pushdown_test.go
  • cypher/models/pgsql/translate/match.go
  • cypher/models/pgsql/translate/model.go
  • cypher/models/pgsql/translate/optimizer_safety_test.go
  • cypher/models/pgsql/translate/path_functions.go
  • cypher/models/pgsql/translate/pattern.go
  • cypher/models/pgsql/translate/predicate.go
  • cypher/models/pgsql/translate/projection.go
  • cypher/models/pgsql/translate/renamer.go
  • cypher/models/pgsql/translate/renamer_test.go
  • cypher/models/pgsql/translate/selectivity.go
  • cypher/models/pgsql/translate/tracking.go
  • cypher/models/pgsql/translate/translator.go
  • cypher/models/pgsql/translate/traversal.go
  • cypher/models/pgsql/translate/with.go
  • cypher/models/walk/walk_pgsql.go
  • docs/optimization-pass-memory.md
  • integration/cypher_test.go
  • integration/testdata/adcs_fanout.json
  • integration/testdata/cases/aggregation_inline.json
  • integration/testdata/cases/expansion_inline.json
  • integration/testdata/cases/multipart_inline.json
  • integration/testdata/cases/optimizer_inline.json
  • integration/testdata/cases/pattern_binding_inline.json
  • integration/testdata/cases/unwind_inline.json
  • integration/testdata/templates/pattern_shapes.json
  • tools/metrics/internal/metrics/quality.go
💤 Files with no reviewable changes (1)
  • cypher/models/pgsql/test/translation_cases/pattern_rewriting.sql

Comment thread cypher/models/pgsql/optimize/locality.go
Comment thread cypher/models/pgsql/optimize/lowering_plan.go
Comment thread cypher/models/pgsql/optimize/selectivity.go
Comment thread cypher/models/pgsql/optimize/source_references.go
Comment thread cypher/models/pgsql/test/translation_cases/pattern_binding.sql Outdated
Comment thread cypher/models/pgsql/translate/expansion.go
Comment thread cypher/models/pgsql/translate/model.go Outdated
Comment thread cypher/models/pgsql/translate/predicate.go
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cypher/models/pgsql/translate/expansion.go (1)

1212-1235: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Don't cross-join prior UNWIND rows into recursive fronts without a row key.

The recursive front rows here only preserve traversal state, so once appendUnwindSourcesIfReferenced prepends the previous frame and UNWIND sources, every frontier row is evaluated against every matching outer UNWIND row. Two outer rows that share the same endpoints but differ only by the UNWIND-bound value can now leak or duplicate paths across each other. This needs an outer-row discriminator threaded through the front/harness state, or recursive UNWIND-dependent constraints should stay unsupported for now.

Also applies to: 1286-1289, 1388-1403, 1447-1450

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cypher/models/pgsql/translate/expansion.go` around lines 1212 - 1235, The
recursive frontier projection (nextQuery.Projection) is missing an outer-row
discriminator so UNWIND/outer rows will be cross-joined into recursive fronts;
fix by threading a unique outer-row key through the front and harness state: add
a dedicated discriminator column (e.g., outerRowID or unwindRowKey) to the
projection alongside expansionModel.Frame.Binding.Identifier, expansionRootID,
expansionDepth and expansionPath, ensure appendUnwindSourcesIfReferenced
populates that discriminator when prepending previous frame/UNWIND sources,
include the discriminator in any path concatenations and in
forwardTerminalSatisfaction/terminal projection logic so recursive rows are only
matched to their originating outer UNWIND row, and apply the same change at the
other affected blocks referenced (around the expansions at the other ranges).
🧹 Nitpick comments (1)
integration/harness.go (1)

91-94: Set CONNECTION_STRING before running integration suite

Please add CONNECTION_STRING in your environment before running make test_all, and run only the backend selected by that scheme (other backend tests should skip).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@integration/harness.go` around lines 91 - 94, The test harness reads
CONNECTION_STRING into connStr in integration/harness.go and calls t.Skip() when
it's empty; before running make test_all set the CONNECTION_STRING environment
variable to the backend you want to test (e.g., export
CONNECTION_STRING="postgres://...") so connStr is populated and the selected
backend tests run; ensure you only run the backend matching that scheme so other
backend test suites remain skipped by the existing connStr check and t.Skip()
guard.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cypher/models/pgsql/optimize/locality.go`:
- Around line 63-66: QueryReferencesOnlyLocalIdentifiers currently treats any
non-nil query.CommonTableExpressions as “has CTEs”, but NewQueryPart initializes
queries with an empty &pgsql.With{}, causing false positives; update
QueryReferencesOnlyLocalIdentifiers to detect real CTEs by checking whether
CommonTableExpressions actually contains entries (e.g. test
len(query.CommonTableExpressions.Ctes) > 0 or call an existing IsEmpty/HasCTEs
helper) instead of only checking for nil, so empty pgsql.With{} is treated as no
CTEs and pushdown logic remains enabled.
- Around line 8-14: FlattenConjunction only handles pointer-form nodes
(*pgsql.BinaryExpression), so value-form BinaryExpression nodes are left
unflattened; update FlattenConjunction to recognize both pointer and value forms
by using a type switch (or separate type assertions) to handle
pgsql.BinaryExpression as well as *pgsql.BinaryExpression, check Operator ==
pgsql.OperatorAnd for either case, and recurse on the appropriate
LOperand/ROperand fields (referencing FlattenConjunction,
pgsql.BinaryExpression, OperatorAnd, LOperand, ROperand) so both value and
pointer trees are flattened.

In `@cypher/models/pgsql/optimize/selectivity.go`:
- Around line 170-205: The code currently grants
selectivityWeightEntityIDReference as soon as one side is an ID ref (in
BinaryExpression handling), which wrongly treats non-point operators (e.g., >,
<, !=) as point-lookups; change it so the ID point-lookup bonus is only applied
for true point operators: compute lOperandIsID/rOperandIsID using
isColumnIDRef(typedNode.LOperand/ROperand) as before, but remove the
unconditional s.addSelectivity(selectivityWeightEntityIDReference) calls and
instead, inside the operator switch for typedNode.Operator, only call
s.addSelectivity(selectivityWeightEntityIDReference) in the cases that represent
point predicates (e.g., pgsql.OperatorEquals, pgsql.OperatorIn, and
pgsql.OperatorIs) and only when exactly one side is an ID ref (lOperandIsID !=
rOperandIsID); leave joins (both ID refs) and other operator cases unchanged.

In `@cypher/models/pgsql/translate/expansion.go`:
- Around line 1192-1195: The current logic calls appendUnwindSourcesIfReferenced
using only expansionModel.EdgeConstraints, which misses UNWIND bindings
referenced exclusively by
forwardTerminalSatisfaction/backwardTerminalSatisfaction (they add
TerminalNodeSatisfactionProjection/PrimerNodeSatisfactionProjection to the
SELECT). Update the decision to include UNWIND sources by inspecting the full
select body: pass a combined set of referenced projections (edge constraints
plus any projections produced by forwardTerminalSatisfaction and
backwardTerminalSatisfaction) into appendUnwindSourcesIfReferenced (or call it
twice with both sets) so that nextQuery (and other similar call sites) include
UNWIND FROM clauses whenever those endpoint satisfaction projections reference
UNWIND bindings; apply the same change at the other occurrences noted.

In `@cypher/models/pgsql/translate/optimizer_safety_test.go`:
- Around line 421-423: Test currently assumes the primer SQL is stored under the
key "pi0" (using translation.Parameters["pi0"] and hasPrimerQuery); instead,
search translation.Parameters for the primer SQL by value/content (iterate
translation.Parameters map, type-assert values to string and compare to the
expected primer SQL text), assert that a matching entry exists, capture that
value into primerQuery for subsequent assertions, and remove the hard-coded
"pi0" key usage so the test no longer depends on internal parameter ordering.

---

Outside diff comments:
In `@cypher/models/pgsql/translate/expansion.go`:
- Around line 1212-1235: The recursive frontier projection
(nextQuery.Projection) is missing an outer-row discriminator so UNWIND/outer
rows will be cross-joined into recursive fronts; fix by threading a unique
outer-row key through the front and harness state: add a dedicated discriminator
column (e.g., outerRowID or unwindRowKey) to the projection alongside
expansionModel.Frame.Binding.Identifier, expansionRootID, expansionDepth and
expansionPath, ensure appendUnwindSourcesIfReferenced populates that
discriminator when prepending previous frame/UNWIND sources, include the
discriminator in any path concatenations and in
forwardTerminalSatisfaction/terminal projection logic so recursive rows are only
matched to their originating outer UNWIND row, and apply the same change at the
other affected blocks referenced (around the expansions at the other ranges).

---

Nitpick comments:
In `@integration/harness.go`:
- Around line 91-94: The test harness reads CONNECTION_STRING into connStr in
integration/harness.go and calls t.Skip() when it's empty; before running make
test_all set the CONNECTION_STRING environment variable to the backend you want
to test (e.g., export CONNECTION_STRING="postgres://...") so connStr is
populated and the selected backend tests run; ensure you only run the backend
matching that scheme so other backend test suites remain skipped by the existing
connStr check and t.Skip() guard.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d2be8088-4497-4814-b198-a244cc83c553

📥 Commits

Reviewing files that changed from the base of the PR and between b7e84bc and 2ed6a0b.

📒 Files selected for processing (24)
  • cypher/models/pgsql/optimize/locality.go
  • cypher/models/pgsql/optimize/lowering_plan.go
  • cypher/models/pgsql/optimize/optimizer_test.go
  • cypher/models/pgsql/optimize/selectivity.go
  • cypher/models/pgsql/optimize/source_references.go
  • cypher/models/pgsql/test/translation_cases/multipart.sql
  • cypher/models/pgsql/test/translation_cases/pattern_binding.sql
  • cypher/models/pgsql/test/translation_cases/pattern_expansion.sql
  • cypher/models/pgsql/translate/constraints.go
  • cypher/models/pgsql/translate/constraints_test.go
  • cypher/models/pgsql/translate/expansion.go
  • cypher/models/pgsql/translate/function.go
  • cypher/models/pgsql/translate/function_test.go
  • cypher/models/pgsql/translate/model.go
  • cypher/models/pgsql/translate/optimizer_safety_test.go
  • cypher/models/pgsql/translate/path_functions.go
  • cypher/models/pgsql/translate/pattern.go
  • cypher/models/pgsql/translate/predicate.go
  • cypher/models/pgsql/translate/predicate_test.go
  • cypher/models/pgsql/translate/projection.go
  • cypher/models/pgsql/translate/selectivity.go
  • cypher/models/pgsql/translate/tracking.go
  • cypher/models/pgsql/translate/tracking_test.go
  • integration/harness.go

Comment on lines +8 to +14
// FlattenConjunction collects the leaf operands of a left-recursive AND chain.
func FlattenConjunction(expr pgsql.Expression) []pgsql.Expression {
if bin, typeOK := expr.(*pgsql.BinaryExpression); !typeOK || bin.Operator != pgsql.OperatorAnd {
return []pgsql.Expression{expr}
} else {
return append(FlattenConjunction(bin.LOperand), FlattenConjunction(bin.ROperand)...)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Handle value-form pgsql.BinaryExpression here too.

The translator supports both pointer and value binary nodes, but this helper only unwraps *pgsql.BinaryExpression. Value-form A AND B trees stay unsplit, so mixed local/external predicates stop being partitioned and pushed down.

Suggested fix
 func FlattenConjunction(expr pgsql.Expression) []pgsql.Expression {
-	if bin, typeOK := expr.(*pgsql.BinaryExpression); !typeOK || bin.Operator != pgsql.OperatorAnd {
-		return []pgsql.Expression{expr}
-	} else {
-		return append(FlattenConjunction(bin.LOperand), FlattenConjunction(bin.ROperand)...)
-	}
+	switch bin := expr.(type) {
+	case *pgsql.BinaryExpression:
+		if bin == nil || bin.Operator != pgsql.OperatorAnd {
+			return []pgsql.Expression{expr}
+		}
+		return append(FlattenConjunction(bin.LOperand), FlattenConjunction(bin.ROperand)...)
+
+	case pgsql.BinaryExpression:
+		if bin.Operator != pgsql.OperatorAnd {
+			return []pgsql.Expression{expr}
+		}
+		return append(FlattenConjunction(bin.LOperand), FlattenConjunction(bin.ROperand)...)
+
+	default:
+		return []pgsql.Expression{expr}
+	}
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cypher/models/pgsql/optimize/locality.go` around lines 8 - 14,
FlattenConjunction only handles pointer-form nodes (*pgsql.BinaryExpression), so
value-form BinaryExpression nodes are left unflattened; update
FlattenConjunction to recognize both pointer and value forms by using a type
switch (or separate type assertions) to handle pgsql.BinaryExpression as well as
*pgsql.BinaryExpression, check Operator == pgsql.OperatorAnd for either case,
and recurse on the appropriate LOperand/ROperand fields (referencing
FlattenConjunction, pgsql.BinaryExpression, OperatorAnd, LOperand, ROperand) so
both value and pointer trees are flattened.

Comment on lines +63 to +66
func QueryReferencesOnlyLocalIdentifiers(query pgsql.Query, localScope *pgsql.IdentifierSet) bool {
if query.CommonTableExpressions != nil {
return false
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't treat an empty WITH placeholder as “has CTEs”.

QueryReferencesOnlyLocalIdentifiers rejects every query with a non-nil CommonTableExpressions, but NewQueryPart initializes queries with &pgsql.With{} by default. That makes locality checks fail even when the query has no CTEs and shuts off the pushdown logic that depends on this helper.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cypher/models/pgsql/optimize/locality.go` around lines 63 - 66,
QueryReferencesOnlyLocalIdentifiers currently treats any non-nil
query.CommonTableExpressions as “has CTEs”, but NewQueryPart initializes queries
with an empty &pgsql.With{}, causing false positives; update
QueryReferencesOnlyLocalIdentifiers to detect real CTEs by checking whether
CommonTableExpressions actually contains entries (e.g. test
len(query.CommonTableExpressions.Ctes) > 0 or call an existing IsEmpty/HasCTEs
helper) instead of only checking for nil, so empty pgsql.With{} is treated as no
CTEs and pushdown logic remains enabled.

Comment on lines +170 to +205
case *pgsql.BinaryExpression:
var (
lOperandIsID = isColumnIDRef(typedNode.LOperand)
rOperandIsID = isColumnIDRef(typedNode.ROperand)
)

if lOperandIsID && !rOperandIsID {
// Point lookup: n0.id = <literal or param>; highly selective.
s.addSelectivity(selectivityWeightEntityIDReference)
} else if rOperandIsID && !lOperandIsID {
// Canonically unusual, but handle it the same.
s.addSelectivity(selectivityWeightEntityIDReference)
}

// If both sides are ID refs, this is a join condition; do not score as a point lookup.
switch typedNode.Operator {
case pgsql.OperatorOr:
s.addSelectivity(selectivityWeightDisjunction)

case pgsql.OperatorNotEquals:
s.addSelectivity(selectivityWeightNotEquals)

case pgsql.OperatorAnd:
s.addSelectivity(selectivityWeightConjunction)

case pgsql.OperatorLessThan, pgsql.OperatorGreaterThan, pgsql.OperatorLessThanOrEqualTo, pgsql.OperatorGreaterThanOrEqualTo:
s.addSelectivity(selectivityWeightRangeComparison)

case pgsql.OperatorLike, pgsql.OperatorILike, pgsql.OperatorRegexMatch, pgsql.OperatorSimilarTo:
s.addSelectivity(selectivityWeightStringSearch)

case pgsql.OperatorIn, pgsql.OperatorEquals, pgsql.OperatorIs:
s.addSelectivity(selectivityWeightNarrowSearch)

case pgsql.OperatorPGArrayOverlap, pgsql.OperatorArrayOverlap:
s.addSelectivity(selectivityWeightNarrowSearch)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Only award the ID point-lookup bonus on point predicates.

This adds selectivityWeightEntityIDReference before checking the operator, so broad predicates like n.id <> 1 or n.id > 1 score like unique lookups. That can skew traversal flipping and bidirectional anchor selection.

Suggested fix
 	case *pgsql.BinaryExpression:
 		var (
 			lOperandIsID = isColumnIDRef(typedNode.LOperand)
 			rOperandIsID = isColumnIDRef(typedNode.ROperand)
 		)
-
-		if lOperandIsID && !rOperandIsID {
-			// Point lookup: n0.id = <literal or param>; highly selective.
-			s.addSelectivity(selectivityWeightEntityIDReference)
-		} else if rOperandIsID && !lOperandIsID {
-			// Canonically unusual, but handle it the same.
-			s.addSelectivity(selectivityWeightEntityIDReference)
-		}
 
 		// If both sides are ID refs, this is a join condition; do not score as a point lookup.
 		switch typedNode.Operator {
 		case pgsql.OperatorOr:
 			s.addSelectivity(selectivityWeightDisjunction)
@@
-		case pgsql.OperatorIn, pgsql.OperatorEquals, pgsql.OperatorIs:
+		case pgsql.OperatorIn, pgsql.OperatorEquals:
 			s.addSelectivity(selectivityWeightNarrowSearch)
+			if (lOperandIsID && !rOperandIsID) || (rOperandIsID && !lOperandIsID) {
+				s.addSelectivity(selectivityWeightEntityIDReference)
+			}
+
+		case pgsql.OperatorIs:
+			s.addSelectivity(selectivityWeightNarrowSearch)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
case *pgsql.BinaryExpression:
var (
lOperandIsID = isColumnIDRef(typedNode.LOperand)
rOperandIsID = isColumnIDRef(typedNode.ROperand)
)
if lOperandIsID && !rOperandIsID {
// Point lookup: n0.id = <literal or param>; highly selective.
s.addSelectivity(selectivityWeightEntityIDReference)
} else if rOperandIsID && !lOperandIsID {
// Canonically unusual, but handle it the same.
s.addSelectivity(selectivityWeightEntityIDReference)
}
// If both sides are ID refs, this is a join condition; do not score as a point lookup.
switch typedNode.Operator {
case pgsql.OperatorOr:
s.addSelectivity(selectivityWeightDisjunction)
case pgsql.OperatorNotEquals:
s.addSelectivity(selectivityWeightNotEquals)
case pgsql.OperatorAnd:
s.addSelectivity(selectivityWeightConjunction)
case pgsql.OperatorLessThan, pgsql.OperatorGreaterThan, pgsql.OperatorLessThanOrEqualTo, pgsql.OperatorGreaterThanOrEqualTo:
s.addSelectivity(selectivityWeightRangeComparison)
case pgsql.OperatorLike, pgsql.OperatorILike, pgsql.OperatorRegexMatch, pgsql.OperatorSimilarTo:
s.addSelectivity(selectivityWeightStringSearch)
case pgsql.OperatorIn, pgsql.OperatorEquals, pgsql.OperatorIs:
s.addSelectivity(selectivityWeightNarrowSearch)
case pgsql.OperatorPGArrayOverlap, pgsql.OperatorArrayOverlap:
s.addSelectivity(selectivityWeightNarrowSearch)
case *pgsql.BinaryExpression:
var (
lOperandIsID = isColumnIDRef(typedNode.LOperand)
rOperandIsID = isColumnIDRef(typedNode.ROperand)
)
// If both sides are ID refs, this is a join condition; do not score as a point lookup.
switch typedNode.Operator {
case pgsql.OperatorOr:
s.addSelectivity(selectivityWeightDisjunction)
case pgsql.OperatorNotEquals:
s.addSelectivity(selectivityWeightNotEquals)
case pgsql.OperatorAnd:
s.addSelectivity(selectivityWeightConjunction)
case pgsql.OperatorLessThan, pgsql.OperatorGreaterThan, pgsql.OperatorLessThanOrEqualTo, pgsql.OperatorGreaterThanOrEqualTo:
s.addSelectivity(selectivityWeightRangeComparison)
case pgsql.OperatorLike, pgsql.OperatorILike, pgsql.OperatorRegexMatch, pgsql.OperatorSimilarTo:
s.addSelectivity(selectivityWeightStringSearch)
case pgsql.OperatorIn, pgsql.OperatorEquals:
s.addSelectivity(selectivityWeightNarrowSearch)
if (lOperandIsID && !rOperandIsID) || (rOperandIsID && !lOperandIsID) {
s.addSelectivity(selectivityWeightEntityIDReference)
}
case pgsql.OperatorIs:
s.addSelectivity(selectivityWeightNarrowSearch)
case pgsql.OperatorPGArrayOverlap, pgsql.OperatorArrayOverlap:
s.addSelectivity(selectivityWeightNarrowSearch)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cypher/models/pgsql/optimize/selectivity.go` around lines 170 - 205, The code
currently grants selectivityWeightEntityIDReference as soon as one side is an ID
ref (in BinaryExpression handling), which wrongly treats non-point operators
(e.g., >, <, !=) as point-lookups; change it so the ID point-lookup bonus is
only applied for true point operators: compute lOperandIsID/rOperandIsID using
isColumnIDRef(typedNode.LOperand/ROperand) as before, but remove the
unconditional s.addSelectivity(selectivityWeightEntityIDReference) calls and
instead, inside the operator switch for typedNode.Operator, only call
s.addSelectivity(selectivityWeightEntityIDReference) in the cases that represent
point predicates (e.g., pgsql.OperatorEquals, pgsql.OperatorIn, and
pgsql.OperatorIs) and only when exactly one side is an ID ref (lOperandIsID !=
rOperandIsID); leave joins (both ID refs) and other operator cases unchanged.

Comment on lines 1192 to +1195
nextQuery.From = []pgsql.FromClause{nextQueryFrom}
if err := s.appendUnwindSourcesIfReferenced(&nextQuery, expansionModel.EdgeConstraints); err != nil {
return pgsql.Query{}, nil, err
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Include UNWIND bindings when satisfaction expressions depend on them.

These attachment checks only look at expansionModel.EdgeConstraints, but both forwardTerminalSatisfaction and backwardTerminalSatisfaction can also inject TerminalNodeSatisfactionProjection / PrimerNodeSatisfactionProjection into the same SELECT. If an UNWIND binding is referenced only from those endpoint predicates, the generated harness fragment still emits that identifier without a matching FROM source. Please decide UNWIND inclusion from the full select body, or at least from both the edge constraints and the satisfaction expression.

Also applies to: 1286-1289, 1375-1378, 1447-1450

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cypher/models/pgsql/translate/expansion.go` around lines 1192 - 1195, The
current logic calls appendUnwindSourcesIfReferenced using only
expansionModel.EdgeConstraints, which misses UNWIND bindings referenced
exclusively by forwardTerminalSatisfaction/backwardTerminalSatisfaction (they
add TerminalNodeSatisfactionProjection/PrimerNodeSatisfactionProjection to the
SELECT). Update the decision to include UNWIND sources by inspecting the full
select body: pass a combined set of referenced projections (edge constraints
plus any projections produced by forwardTerminalSatisfaction and
backwardTerminalSatisfaction) into appendUnwindSourcesIfReferenced (or call it
twice with both sets) so that nextQuery (and other similar call sites) include
UNWIND FROM clauses whenever those endpoint satisfaction projections reference
UNWIND bindings; apply the same change at the other occurrences noted.

Comment on lines +421 to +423
primerQuery, hasPrimerQuery := translation.Parameters["pi0"].(string)

require.True(t, hasPrimerQuery)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid pinning this assertion to pi0.

The parameter name is an internal ordering detail, so this test will fail if another parameter is introduced ahead of the primer query. Look up the primer SQL by value/content instead of hard-coding the key.

Suggested change
-	primerQuery, hasPrimerQuery := translation.Parameters["pi0"].(string)
-
-	require.True(t, hasPrimerQuery)
+	var primerQuery string
+	for _, value := range translation.Parameters {
+		candidate, ok := value.(string)
+		if ok && strings.Contains(candidate, "unnest(array ['source']::text[]) as i0") {
+			primerQuery = candidate
+			break
+		}
+	}
+
+	require.NotEmpty(t, primerQuery)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
primerQuery, hasPrimerQuery := translation.Parameters["pi0"].(string)
require.True(t, hasPrimerQuery)
var primerQuery string
for _, value := range translation.Parameters {
candidate, ok := value.(string)
if ok && strings.Contains(candidate, "unnest(array ['source']::text[]) as i0") {
primerQuery = candidate
break
}
}
require.NotEmpty(t, primerQuery)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cypher/models/pgsql/translate/optimizer_safety_test.go` around lines 421 -
423, Test currently assumes the primer SQL is stored under the key "pi0" (using
translation.Parameters["pi0"] and hasPrimerQuery); instead, search
translation.Parameters for the primer SQL by value/content (iterate
translation.Parameters map, type-assert values to string and compare to the
expected primer SQL text), assert that a matching entry exists, capture that
value into primerQuery for subsequent assertions, and remove the hard-coded
"pi0" key usage so the test no longer depends on internal parameter ordering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant