Skip to content

Restore contsel/contjoinsel for containment & key-existence operators (#2356)#2417

Open
crprashant wants to merge 1 commit intoapache:masterfrom
crprashant:fix/2356-restore-contsel-for-containment
Open

Restore contsel/contjoinsel for containment & key-existence operators (#2356)#2417
crprashant wants to merge 1 commit intoapache:masterfrom
crprashant:fix/2356-restore-contsel-for-containment

Conversation

@crprashant
Copy link
Copy Markdown
Contributor

Restore contsel/contjoinsel for containment & key-existence operators

Closes #2356.

Why

The containment (@>, <@, @>>, <<@) and key-existence (?, ?|, ?&) operators on agtype are bound to matchingsel / matchingjoinsel. matchingsel is intended for pattern operators (LIKE/regex) — during planning it invokes the operator's underlying function (agtype_contains) once per pg_statistic MCV entry. With realistic statistics targets that produces a planner-time regression that dominates simple OLTP-style point queries.

PostgreSQL core binds jsonb's analogous operators (@>, <@, ? on jsonb) to contsel / contjoinsel for exactly this reason. This PR restores that precedent for agtype.

What

  • sql/agtype_operators.sql, sql/agtype_exists.sql: 10 operators flipped to contsel / contjoinsel.
  • age--1.7.0--y.y.y.sql: appended ALTER OPERATOR ... SET (RESTRICT, JOIN) for all 10 operators so existing installs flip on ALTER EXTENSION age UPDATE.
  • regress/sql/containment_selectivity.sql: new test that pins the bindings via pg_operator, plus a no-leaked-matchingsel aggregate guard and functional smoke for all 10 operators.
  • regress/expected/cypher_match.out, regress/expected/cypher_vle.out: refresh expected — test_enable_containment now picks Nested Loop + Index Only Scans over Seq Scan/Hash Join (a direct, better plan), and two MATCH p=... and show_list_use_vle queries flip row order (they have no ORDER BY; result set unchanged).
  • Makefile: register the new test in REGRESS.

Validation

Regression suite

36/37 pass with EXTRA_TESTS="pgvector fuzzystrmatch pg_trgm". Only age_upgrade fails — pre-existing on master at 774e781b (verified by git stash && installcheck baseline of 32/33 with the same age_upgrade failure).

Reporter's exact methodology — reproduced

Reporter's three scripts (generate_graph.sql, setup_func_for_workload.sql, workload_select.sql) used unchanged. Run on PG18 with default_statistics_target = 1000 to populate MCV lists, matching the reporter's analyzed-graph conditions:

Metric matchingsel contsel Delta
EXPLAIN planning time 1.42 ms 0.97 ms −32%
EXPLAIN execution time 0.34 ms 0.31 ms ~0%
pgbench TPS (8 clients × 30s) 5247 7378 +40.6%

The −32% planning-time delta lines up with the reporter's "~30% of execution time spent in agtype_contains during planning" observation; the +40.6% TPS gain matches the "severe TPS drop in OLTP-style workloads" they reported.

Upgrade path

Validated end-to-end during the benchmark: operator bindings were flipped from matchingselcontsel via the same ALTER OPERATOR statements the upgrade SQL ships, while operators remained functional throughout.

pg_operator snapshots

-- Before (matchingsel binding shipped on master)
 oprname |  lhs   |  rhs   |  restrict_fn |    join_fn
---------+--------+--------+--------------+------------------
 <<@     | agtype | agtype | matchingsel  | matchingjoinsel
 <@      | agtype | agtype | matchingsel  | matchingjoinsel
 @>      | agtype | agtype | matchingsel  | matchingjoinsel
 @>>     | agtype | agtype | matchingsel  | matchingjoinsel
 ?       | agtype | agtype | matchingsel  | matchingjoinsel
 ?       | agtype | text   | matchingsel  | matchingjoinsel
 ?&      | agtype | agtype | matchingsel  | matchingjoinsel
 ?&      | agtype | text[] | matchingsel  | matchingjoinsel
 ?|      | agtype | agtype | matchingsel  | matchingjoinsel
 ?|      | agtype | text[] | matchingsel  | matchingjoinsel

-- After (this PR)
 oprname |  lhs   |  rhs   | restrict_fn |   join_fn
---------+--------+--------+-------------+-------------
 <<@     | agtype | agtype | contsel     | contjoinsel
 <@      | agtype | agtype | contsel     | contjoinsel
 @>      | agtype | agtype | contsel     | contjoinsel
 @>>     | agtype | agtype | contsel     | contjoinsel
 ?       | agtype | agtype | contsel     | contjoinsel
 ?       | agtype | text   | contsel     | contjoinsel
 ?&      | agtype | agtype | contsel     | contjoinsel
 ?&      | agtype | text[] | contsel     | contjoinsel
 ?|      | agtype | agtype | contsel     | contjoinsel
 ?|      | agtype | text[] | contsel     | contjoinsel

Driver workflows

Intentionally not run: this PR only adjusts pg_operator selectivity metadata. There is no C code, type, or wire-protocol change that python / go / node / JDBC drivers could observe.

Notes for reviewers

  • matchingsel does provide better estimates when good statistics exist on heavily-analyzed agtype columns. PostgreSQL core accepts the same trade-off for jsonb. A future improvement (out of scope here) would be a custom agtype_contains_selectivity mirroring jsonb_sel; happy to file as a follow-up if there's interest.
  • The containment_selectivity regression test is intentionally minimal so the diff is loud and precise if anyone re-introduces matchingsel here. The aggregate guard catches future operator additions that forget the right helper.

@crprashant
Copy link
Copy Markdown
Contributor Author

Self-review caught a bug in the smoke test: SELECT ...::agtype ? 'a' was being parsed as the (agtype, agtype) overload (so 'a' was treated as an agtype literal — invalid syntax), producing an ERROR rather than verifying the (agtype, text) binding works. Added an explicit ::text cast on the right-hand side and regenerated the expected output.

Force-pushed amend 50beb10. Full installcheck still 36/37 (only pre-existing age_upgrade fails on master too).

@crprashant crprashant force-pushed the fix/2356-restore-contsel-for-containment branch from 50beb10 to bd52fd1 Compare April 30, 2026 23:18
@jrgemignani jrgemignani requested review from MuhammadTahaNaveed, Copilot and jrgemignani and removed request for MuhammadTahaNaveed May 4, 2026 18:27
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Restores lightweight planner selectivity/joinsel functions (contsel/contjoinsel) for agtype containment and key-existence operators to prevent planner-time regressions caused by matchingsel evaluating operator functions against MCV statistics during planning.

Changes:

  • Rebinds 10 agtype containment/key-existence operators to contsel / contjoinsel in the install SQL.
  • Adds upgrade SQL (ALTER OPERATOR ... SET (RESTRICT, JOIN)) so existing installs flip bindings on extension update.
  • Introduces a regression test that pins the bindings via pg_operator, plus updates expected outputs impacted by the resulting plan/order changes.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
sql/agtype_operators.sql Switches containment operators’ RESTRICT/JOIN to contsel/contjoinsel.
sql/agtype_exists.sql Switches key-existence operator overloads’ RESTRICT/JOIN to contsel/contjoinsel.
age--1.7.0--y.y.y.sql Adds upgrade-time ALTER OPERATOR ... SET (RESTRICT, JOIN) for all affected operators.
regress/sql/containment_selectivity.sql Adds regression coverage to pin operator bindings and smoke-test operator behavior.
regress/expected/containment_selectivity.out Expected output for the new regression test.
regress/expected/cypher_match.out Updates expected output for plan/order changes caused by new selectivity bindings.
regress/expected/cypher_vle.out Updates expected output for row order changes (no ORDER BY) under the new plan.
Makefile Registers containment_selectivity in REGRESS.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread regress/expected/cypher_match.out Outdated
Comment thread regress/expected/cypher_vle.out
Comment thread regress/sql/containment_selectivity.sql Outdated
…apache#2356)

The containment (`@>`, `<@`, `@>>`, `<<@`) and key-existence (`?`, `?|`,
`?&`) operators on `agtype` were bound to `matchingsel`/`matchingjoinsel`
on the PG14+ source tree. `matchingsel` is built for pattern operators
(LIKE/regex) and during planning invokes the operator's underlying
function (`agtype_contains`) once per `pg_statistic` MCV. With realistic
statistics targets that produces a planner-time regression that
dominates simple OLTP-style point queries.

Restore the lighter `contsel`/`contjoinsel` helpers used by PostgreSQL
core's jsonb operators (`@>`, `<@`, `?` on jsonb), which matches
upstream's long-standing precedent for the same operator family.

Changes:

* `sql/agtype_operators.sql`, `sql/agtype_exists.sql`: 10 operators
  flipped from `matchingsel`/`matchingjoinsel` to
  `contsel`/`contjoinsel`.
* `age--1.7.0--y.y.y.sql`: appended `ALTER OPERATOR ... SET (RESTRICT,
  JOIN)` for all 10 operators so existing installs flip on
  `ALTER EXTENSION age UPDATE`.
* `regress/sql/containment_selectivity.sql` (+ `expected/.out`): pin
  the bindings via `pg_operator`, plus a "no leaked matchingsel"
  aggregate guard and functional smoke for all 10 operators. The
  guard catches future regressions if a new operator is added without
  the right selectivity helper.
* `regress/expected/cypher_match.out`, `regress/expected/cypher_vle.out`:
  refresh expected to reflect new (and better) plan shapes that the
  lower-selectivity helper produces — `test_enable_containment` now
  picks Nested Loop + Index Only Scans over a Seq Scan/Hash Join, and
  two `MATCH p=...` and `show_list_use_vle` queries flip row order
  (queries had no `ORDER BY`; result set is unchanged, only ordering).
* `Makefile`: register `containment_selectivity` in `REGRESS`.

Validation:

* Build: clean, `-Werror`.
* Regression: 36/37 tests pass under `EXTRA_TESTS="pgvector
  fuzzystrmatch pg_trgm"`. Only `age_upgrade` fails — pre-existing on
  master at 774e781 (verified by `git stash && installcheck`).
* Reporter's exact methodology (LDBC-SNB-style snb_graph + pgbench on
  `bench_message_content`) reproduces the regression and the fix:

  | Metric                     | matchingsel | contsel | Delta |
  |----------------------------|-------------|---------|-------|
  | EXPLAIN planning time (ms) | 1.42        | 0.97    | -32%  |
  | EXPLAIN execution time (ms)| 0.34        | 0.31    | ~equal|
  | pgbench TPS (8c x 30s)     | 5247        | 7378    | +40.6%|

  Run with `default_statistics_target = 1000` to populate MCV lists,
  matching the reporter's analyzed-graph conditions.

* Upgrade path: validated end-to-end during the benchmark — operator
  bindings were flipped from `matchingsel` -> `contsel` via the same
  `ALTER OPERATOR` statements the upgrade SQL ships, while operators
  remained functional throughout.

Driver workflows (python/go/node/jdbc) intentionally not run: this PR
only adjusts pg_operator selectivity metadata. There is no C, type, or
protocol change that drivers could observe.

Closes apache#2356.
@crprashant crprashant force-pushed the fix/2356-restore-contsel-for-containment branch from bd52fd1 to bff7be6 Compare May 4, 2026 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Critical TPS drop in PG 18 branch caused by matchingsel selectivity for @> operator

2 participants