Restore contsel/contjoinsel for containment & key-existence operators (#2356)#2417
Restore contsel/contjoinsel for containment & key-existence operators (#2356)#2417crprashant wants to merge 1 commit intoapache:masterfrom
Conversation
9b27aec to
50beb10
Compare
|
Self-review caught a bug in the smoke test: Force-pushed amend |
50beb10 to
bd52fd1
Compare
There was a problem hiding this comment.
Pull request overview
Restores lightweight planner selectivity/joinsel functions (contsel/contjoinsel) for agtype containment and key-existence operators to prevent planner-time regressions caused by matchingsel evaluating operator functions against MCV statistics during planning.
Changes:
- Rebinds 10
agtypecontainment/key-existence operators tocontsel/contjoinselin the install SQL. - Adds upgrade SQL (
ALTER OPERATOR ... SET (RESTRICT, JOIN)) so existing installs flip bindings on extension update. - Introduces a regression test that pins the bindings via
pg_operator, plus updates expected outputs impacted by the resulting plan/order changes.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| sql/agtype_operators.sql | Switches containment operators’ RESTRICT/JOIN to contsel/contjoinsel. |
| sql/agtype_exists.sql | Switches key-existence operator overloads’ RESTRICT/JOIN to contsel/contjoinsel. |
| age--1.7.0--y.y.y.sql | Adds upgrade-time ALTER OPERATOR ... SET (RESTRICT, JOIN) for all affected operators. |
| regress/sql/containment_selectivity.sql | Adds regression coverage to pin operator bindings and smoke-test operator behavior. |
| regress/expected/containment_selectivity.out | Expected output for the new regression test. |
| regress/expected/cypher_match.out | Updates expected output for plan/order changes caused by new selectivity bindings. |
| regress/expected/cypher_vle.out | Updates expected output for row order changes (no ORDER BY) under the new plan. |
| Makefile | Registers containment_selectivity in REGRESS. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…apache#2356) The containment (`@>`, `<@`, `@>>`, `<<@`) and key-existence (`?`, `?|`, `?&`) operators on `agtype` were bound to `matchingsel`/`matchingjoinsel` on the PG14+ source tree. `matchingsel` is built for pattern operators (LIKE/regex) and during planning invokes the operator's underlying function (`agtype_contains`) once per `pg_statistic` MCV. With realistic statistics targets that produces a planner-time regression that dominates simple OLTP-style point queries. Restore the lighter `contsel`/`contjoinsel` helpers used by PostgreSQL core's jsonb operators (`@>`, `<@`, `?` on jsonb), which matches upstream's long-standing precedent for the same operator family. Changes: * `sql/agtype_operators.sql`, `sql/agtype_exists.sql`: 10 operators flipped from `matchingsel`/`matchingjoinsel` to `contsel`/`contjoinsel`. * `age--1.7.0--y.y.y.sql`: appended `ALTER OPERATOR ... SET (RESTRICT, JOIN)` for all 10 operators so existing installs flip on `ALTER EXTENSION age UPDATE`. * `regress/sql/containment_selectivity.sql` (+ `expected/.out`): pin the bindings via `pg_operator`, plus a "no leaked matchingsel" aggregate guard and functional smoke for all 10 operators. The guard catches future regressions if a new operator is added without the right selectivity helper. * `regress/expected/cypher_match.out`, `regress/expected/cypher_vle.out`: refresh expected to reflect new (and better) plan shapes that the lower-selectivity helper produces — `test_enable_containment` now picks Nested Loop + Index Only Scans over a Seq Scan/Hash Join, and two `MATCH p=...` and `show_list_use_vle` queries flip row order (queries had no `ORDER BY`; result set is unchanged, only ordering). * `Makefile`: register `containment_selectivity` in `REGRESS`. Validation: * Build: clean, `-Werror`. * Regression: 36/37 tests pass under `EXTRA_TESTS="pgvector fuzzystrmatch pg_trgm"`. Only `age_upgrade` fails — pre-existing on master at 774e781 (verified by `git stash && installcheck`). * Reporter's exact methodology (LDBC-SNB-style snb_graph + pgbench on `bench_message_content`) reproduces the regression and the fix: | Metric | matchingsel | contsel | Delta | |----------------------------|-------------|---------|-------| | EXPLAIN planning time (ms) | 1.42 | 0.97 | -32% | | EXPLAIN execution time (ms)| 0.34 | 0.31 | ~equal| | pgbench TPS (8c x 30s) | 5247 | 7378 | +40.6%| Run with `default_statistics_target = 1000` to populate MCV lists, matching the reporter's analyzed-graph conditions. * Upgrade path: validated end-to-end during the benchmark — operator bindings were flipped from `matchingsel` -> `contsel` via the same `ALTER OPERATOR` statements the upgrade SQL ships, while operators remained functional throughout. Driver workflows (python/go/node/jdbc) intentionally not run: this PR only adjusts pg_operator selectivity metadata. There is no C, type, or protocol change that drivers could observe. Closes apache#2356.
bd52fd1 to
bff7be6
Compare
Restore contsel/contjoinsel for containment & key-existence operators
Closes #2356.
Why
The containment (
@>,<@,@>>,<<@) and key-existence (?,?|,?&) operators onagtypeare bound tomatchingsel/matchingjoinsel.matchingselis intended for pattern operators (LIKE/regex) — during planning it invokes the operator's underlying function (agtype_contains) once perpg_statisticMCV entry. With realistic statistics targets that produces a planner-time regression that dominates simple OLTP-style point queries.PostgreSQL core binds jsonb's analogous operators (
@>,<@,?onjsonb) tocontsel/contjoinselfor exactly this reason. This PR restores that precedent foragtype.What
sql/agtype_operators.sql,sql/agtype_exists.sql: 10 operators flipped tocontsel/contjoinsel.age--1.7.0--y.y.y.sql: appendedALTER OPERATOR ... SET (RESTRICT, JOIN)for all 10 operators so existing installs flip onALTER EXTENSION age UPDATE.regress/sql/containment_selectivity.sql: new test that pins the bindings viapg_operator, plus a no-leaked-matchingsel aggregate guard and functional smoke for all 10 operators.regress/expected/cypher_match.out,regress/expected/cypher_vle.out: refresh expected —test_enable_containmentnow picks Nested Loop + Index Only Scans over Seq Scan/Hash Join (a direct, better plan), and twoMATCH p=...andshow_list_use_vlequeries flip row order (they have noORDER BY; result set unchanged).Makefile: register the new test inREGRESS.Validation
Regression suite
36/37 pass with
EXTRA_TESTS="pgvector fuzzystrmatch pg_trgm". Onlyage_upgradefails — pre-existing on master at774e781b(verified bygit stash && installcheckbaseline of 32/33 with the sameage_upgradefailure).Reporter's exact methodology — reproduced
Reporter's three scripts (
generate_graph.sql,setup_func_for_workload.sql,workload_select.sql) used unchanged. Run on PG18 withdefault_statistics_target = 1000to populate MCV lists, matching the reporter's analyzed-graph conditions:The −32% planning-time delta lines up with the reporter's "~30% of execution time spent in agtype_contains during planning" observation; the +40.6% TPS gain matches the "severe TPS drop in OLTP-style workloads" they reported.
Upgrade path
Validated end-to-end during the benchmark: operator bindings were flipped from
matchingsel→contselvia the sameALTER OPERATORstatements the upgrade SQL ships, while operators remained functional throughout.pg_operatorsnapshotsDriver workflows
Intentionally not run: this PR only adjusts
pg_operatorselectivity metadata. There is no C code, type, or wire-protocol change that python / go / node / JDBC drivers could observe.Notes for reviewers
matchingseldoes provide better estimates when good statistics exist on heavily-analyzedagtypecolumns. PostgreSQL core accepts the same trade-off for jsonb. A future improvement (out of scope here) would be a customagtype_contains_selectivitymirroringjsonb_sel; happy to file as a follow-up if there's interest.containment_selectivityregression test is intentionally minimal so the diff is loud and precise if anyone re-introducesmatchingselhere. The aggregate guard catches future operator additions that forget the right helper.