diff --git a/README.md b/README.md
index daf821c..fe1a6b1 100644
--- a/README.md
+++ b/README.md
@@ -12,34 +12,20 @@ without it.
## Quick Start
-Two equivalent interfaces build and maintain the per-partition summaries. Use whichever
-fits your workflow.
-
-### Function interface
+Summaries are built and maintained through a custom index access method, so pruning
+follows the normal index lifecycle (`pg_dump`/restore, `REINDEX`, `DROP INDEX`).
```sql
CREATE EXTENSION pg_table_range;
--- Register one or more columns of a partitioned (or plain) table and build summaries.
--- Pass the relation as an OID; cast a name with ::regclass::oid.
-SELECT table_range_create('events'::regclass::oid, ARRAY['val', 'created_at']);
+-- Summarize one or more columns of a partitioned (or plain) table.
+CREATE INDEX events_tr ON events USING table_range (val, created_at);
--- Queries now prune partitions whose summarized range cannot match the predicate.
+-- Queries now prune partitions whose summary cannot match the predicate.
-- Verify with EXPLAIN: non-matching partitions disappear from the plan.
EXPLAIN (COSTS OFF) SELECT * FROM events WHERE val >= 250;
--- Recompute after bulk loads (also clears staleness); or drop registration entirely.
-SELECT table_range_refresh('events'::regclass::oid);
-SELECT table_range_drop('events'::regclass::oid);
-```
-
-### Index interface
-
-```sql
--- Builds the same summaries via a custom index access method.
-CREATE INDEX events_tr ON events USING table_range (val, created_at);
-
--- Pruning works immediately; REINDEX rebuilds summaries after heavy churn.
+-- Recompute after heavy churn; or drop the summaries entirely.
REINDEX INDEX events_tr;
DROP INDEX events_tr; -- removes the summaries it built
```
@@ -47,6 +33,20 @@ DROP INDEX events_tr; -- removes the summaries it built
The index is never used for scans — it exists only to build and own the summaries — so
it adds no scan-time overhead and is never chosen by the planner for data access.
+### Supported column types (no setup, including PostGIS)
+
+`CREATE INDEX … USING table_range` works on any **btree-comparable** type and any
+**range** type out of the box. The required operator classes are provisioned
+automatically by mirroring the types that already have a btree/range operator class — and
+that mirror re-runs whenever an extension is installed, so **PostGIS geometry works the
+moment you `CREATE EXTENSION postgis`, with no extra step**:
+
+```sql
+CREATE EXTENSION postgis; -- geometry opclass auto-registers
+CREATE INDEX places_tr ON places USING table_range (geom);
+EXPLAIN (COSTS OFF) SELECT * FROM places WHERE geom && ST_MakeEnvelope(0,0,10,10);
+```
+
## How it works
- **Summaries.** For each leaf partition and indexed column, one row in
@@ -69,23 +69,30 @@ it adds no scan-time overhead and is never chosen by the planner for data access
is pruned by testing the constant against the partition's stored extent with
PostgreSQL's own `&&` operator — so a partition is eliminated when its extent cannot
overlap the query.
-- **Automatic correctness.** Data changes mark the affected partition's summaries
- *stale*, and stale summaries are never used for pruning — so a change can never cause a
- missing row. The function interface installs row-level triggers
- (`INSERT`/`UPDATE`/`DELETE`/`TRUNCATE`); the index interface marks stale from
- `aminsert`. `table_range_refresh` (or `REINDEX`) recomputes and re-enables pruning.
- A `sql_drop` event trigger removes summaries for any dropped relation or index.
+- **Automatic correctness.** An insert that extends a partition marks its summary
+ *stale* (via the index's `aminsert`), and stale summaries are never used for pruning —
+ so a change can never cause a missing row. Deletes only shrink a partition's true
+ range, so the summary stays conservatively wide and remains safe. `REINDEX` recomputes
+ and re-enables pruning after churn, and a `sql_drop` event trigger removes a dropped
+ index's (or table's) summaries.
## Performance
-The win is at **planning time** on wide trees, where the planner would otherwise build
-paths for every partition. On a 1000-partition table queried by a non-key column
-(`bench/planning_benchmark.sql`, PostgreSQL 18):
+The benefit is at **execution**: a selective predicate on a non-key column scans only the
+matching partition instead of every partition. On 100 partitions × 30k rows = 3M rows
+(`bench/planning_benchmark.sql`, PostgreSQL 18, warm):
+
+| | Total query time (plan + exec) |
+|---|---|
+| pruning off (scans all 100 partitions) | ~125 ms |
+| pruning on (scans 1 partition) | ~18 ms |
-| | Planning time | Result |
-|---|---|---|
-| pruning off | ~125 ms | 50 rows |
-| pruning on | ~17 ms | 50 rows |
+Pruning is **not** a free planning-time win: it adds a small per-plan overhead (loading
+summaries once, then evaluating each partition — single-digit to low-tens of ms on
+hundreds of partitions). It pays off when the partitions it eliminates are large enough
+that avoiding their scan outweighs that overhead — so it helps most on **large
+partitions with a selective non-key predicate**, and can be a slight net cost on tiny
+partitions. Use `table_range.enable_pruning` to measure both ways on your workload.
Summaries are loaded **once per plan** (not per partition); the
`e2e_per_plan_cache_loads_once_regardless_of_partitions` test asserts exactly one
@@ -114,20 +121,18 @@ Everything not listed is conservatively **kept** (never mispruned):
## Catalog
-- `table_range_summary` — one summary row per (owner, leaf partition, column):
+- `table_range_summary` — one summary row per (index, leaf partition, column):
`index_oid`, `relid`, `attnum`, `kind` (`minmax` or `overlap`), `type_name`,
`min_summary`, `max_summary`, `has_nulls`, `all_nulls`, `stale`, `tuple_version`.
-- `table_range_registered` — parents registered through the function interface and their
- columns.
## Project layout
| File | Responsibility |
|------|----------------|
| `src/lib.rs` | GUCs, `_PG_init`, catalog/bootstrap SQL, test wiring |
-| `src/summary_build.rs` | SPI summary build/refresh/drop, registration, staleness triggers |
+| `src/summary_build.rs` | SPI summary build (scalar min/max + range/geometry extent) |
| `src/prune_hook.rs` | planner + pathlist hooks, per-plan cache, typed in-memory evaluation |
-| `src/index_am.rs` | `table_range` index access method and operator classes |
+| `src/index_am.rs` | `table_range` index access method + automatic operator-class provisioning |
| `src/e2e_tests.rs`, `src/index_am_tests.rs` | end-to-end tests |
## Building and testing
@@ -149,7 +154,5 @@ range-type tests, which exercise the same code path.
- `NOT IN` / `<> ALL`, `NOT (...)`, expression predicates, and parameterized
prepared-statement plans are kept rather than pruned.
-- Summaries are exact at build/refresh time; between changes and a refresh the affected
- partitions are simply not pruned (always correct, just less selective).
-- The index interface marks a partition stale on insert; recompute with `REINDEX` (or
- `table_range_refresh` for the function interface) to re-enable pruning after churn.
+- Summaries are exact at build time; an insert that extends a partition marks it stale
+ (not pruned, but still correct) until the next `REINDEX`.
diff --git a/bench/planning_benchmark.sql b/bench/planning_benchmark.sql
index 4558a5f..a2463e8 100644
--- a/bench/planning_benchmark.sql
+++ b/bench/planning_benchmark.sql
@@ -1,49 +1,51 @@
--- Planning-time benchmark for table_range pruning.
+-- Benchmark for table_range pruning.
--
--- Builds a wide partition tree where the queried column is NOT the partition key, so
--- native PostgreSQL pruning cannot help, and compares planning time + plan size with
--- table_range pruning off vs. on. Run with:
+-- Measures end-to-end query time (planning + execution, warm) for a selective predicate
+-- on a NON-partition-key column, with table_range pruning on vs. off. Native PostgreSQL
+-- cannot prune on a non-key column, so without pruning the query scans every partition.
--
-- cargo pgrx run pg18
-- \i bench/planning_benchmark.sql
--
--- Look at the "Planning Time" line and the number of child plans in each EXPLAIN.
+-- Pruning trades a small per-plan overhead (loading summaries + evaluating each
+-- partition) for skipping the scan of non-matching partitions, so it wins when the
+-- partitions it eliminates are large enough to outweigh that overhead.
-\set part_count 1000
+\set part_count 100
+\set rows_per_part 30000
DROP TABLE IF EXISTS bench_events CASCADE;
-CREATE TABLE bench_events (region int, val bigint) PARTITION BY LIST (region);
+CREATE TABLE bench_events (region int, val bigint, pad text) PARTITION BY LIST (region);
--- One partition per region; each holds a disjoint 1000-wide band of `val`.
SELECT format(
- 'CREATE TABLE bench_events_%s PARTITION OF bench_events FOR VALUES IN (%s);',
- g, g
-)
+ 'CREATE TABLE bench_events_%s PARTITION OF bench_events FOR VALUES IN (%s);', g, g)
FROM generate_series(1, :part_count) g \gexec
+-- region is the partition key; val is a disjoint band per partition (the queried,
+-- non-key column).
INSERT INTO bench_events
-SELECT g, (g * 1000) + s
-FROM generate_series(1, :part_count) g,
- generate_series(0, 49) s;
+SELECT g, g * 1000000 + s, repeat('x', 50)
+FROM generate_series(1, :part_count) g, generate_series(0, :rows_per_part - 1) s;
-ANALYZE bench_events;
+VACUUM ANALYZE bench_events;
+CREATE INDEX bench_events_tr ON bench_events USING table_range (val);
-SELECT table_range_create('bench_events'::regclass::oid, ARRAY['val']);
+\timing on
-\echo '==================== pruning OFF ===================='
-SET table_range.enable_pruning = off;
-EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY ON)
-SELECT * FROM bench_events WHERE val BETWEEN 500000 AND 500049;
-
-\echo '==================== pruning ON ===================='
+-- Warm the relation cache first so the numbers reflect steady state, not first-touch
+-- partition-metadata loading (which both modes pay equally).
SET table_range.enable_pruning = on;
-EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY ON)
-SELECT * FROM bench_events WHERE val BETWEEN 500000 AND 500049;
+SELECT count(*) FROM bench_events WHERE val = 50000000;
+
+\echo '==================== pruning ON (warm) ===================='
+SELECT count(*) FROM bench_events WHERE val = 50000000;
+SELECT count(*) FROM bench_events WHERE val = 50000000;
-\echo '==================== correctness check (must match) ===================='
SET table_range.enable_pruning = off;
-SELECT count(*) AS off_count FROM bench_events WHERE val BETWEEN 500000 AND 500049;
-SET table_range.enable_pruning = on;
-SELECT count(*) AS on_count FROM bench_events WHERE val BETWEEN 500000 AND 500049;
+SELECT count(*) FROM bench_events WHERE val = 50000000;
+
+\echo '==================== pruning OFF (warm) ===================='
+SELECT count(*) FROM bench_events WHERE val = 50000000;
+SELECT count(*) FROM bench_events WHERE val = 50000000;
DROP TABLE bench_events CASCADE;
diff --git a/src/e2e_tests.rs b/src/e2e_tests.rs
index 3129926..ad2bf1c 100644
--- a/src/e2e_tests.rs
+++ b/src/e2e_tests.rs
@@ -3,19 +3,26 @@
// the pgrx test harness invokes.
//
// These exercise the full path: create a partitioned table, populate disjoint value
-// ranges per partition, build summaries with `table_range_create`, then verify that
-// (a) the planner eliminates non-matching partitions (via EXPLAIN) and (b) results are
-// identical with pruning on and off (no false negatives).
+// ranges per partition, build summaries with `CREATE INDEX ... USING table_range`, then
+// verify that (a) the planner eliminates non-matching partitions (via EXPLAIN) and
+// (b) results are identical with pruning on and off (no false negatives).
//
-// The partition key (`region`) deliberately differs from the queried data column
-// (`val`), so native PostgreSQL partition pruning cannot help — only the table_range
-// summaries can eliminate partitions here.
+// The partition key (`region`) deliberately differs from the queried data column, so
+// native PostgreSQL partition pruning cannot help — only the table_range summaries can.
+
+/// Build summaries for `cols` of `table` via a table_range index named `
_tr`.
+fn e2e_build(table: &str, cols: &str) {
+ Spi::run(&format!(
+ "CREATE INDEX {table}_tr ON {table} USING table_range ({cols})"
+ ))
+ .expect("create table_range index");
+}
/// Build a 3-way LIST-partitioned table with disjoint `val` ranges:
/// events_r1: region=1, val in [0, 99]
/// events_r2: region=2, val in [100, 199]
/// events_r3: region=3, val in [200, 299]
-/// Then register summaries on `val`.
+/// Then summarize `val`.
fn e2e_setup_events() {
Spi::run(
"DROP TABLE IF EXISTS events CASCADE;
@@ -28,11 +35,7 @@ fn e2e_setup_events() {
INSERT INTO events SELECT 3, g FROM generate_series(200, 299) g;",
)
.expect("setup events");
- let written =
- Spi::get_one::("SELECT table_range_create('events'::regclass::oid, ARRAY['val'])")
- .expect("create ok")
- .expect("create returned count");
- assert_eq!(written, 3, "one summary per leaf partition");
+ e2e_build("events", "val");
}
fn e2e_explain(query: &str) -> String {
@@ -50,6 +53,10 @@ fn e2e_explain(query: &str) -> String {
.expect("explain")
}
+fn e2e_explain_on(table: &str, pred: &str) -> String {
+ e2e_explain(&format!("SELECT * FROM {table} WHERE {pred}"))
+}
+
fn e2e_set_pruning(on: bool) {
Spi::run(&format!(
"SET table_range.enable_pruning = {}",
@@ -126,24 +133,6 @@ fn e2e_boundary_equality_keeps_correct_partition() {
assert_eq!(e2e_count_where("val = 100"), 1);
}
-#[pg_test]
-fn e2e_refresh_picks_up_new_data_range() {
- e2e_setup_events();
- e2e_set_pruning(true);
- let plan_before = e2e_explain("SELECT * FROM events WHERE val = 500");
- assert!(!plan_before.contains("events_r1"));
-
- Spi::run("INSERT INTO events VALUES (1, 500)").expect("insert");
- let n = Spi::get_one::("SELECT table_range_refresh('events'::regclass::oid)")
- .expect("refresh")
- .expect("count");
- assert_eq!(n, 3);
-
- let plan_after = e2e_explain("SELECT * FROM events WHERE val = 500");
- assert!(plan_after.contains("events_r1"), "r1 must reappear:\n{plan_after}");
- assert_eq!(e2e_count_where("val = 500"), 1);
-}
-
#[pg_test]
fn e2e_disabled_pruning_scans_all_partitions() {
e2e_setup_events();
@@ -154,25 +143,6 @@ fn e2e_disabled_pruning_scans_all_partitions() {
assert!(plan.contains("events_r3"));
}
-#[pg_test]
-fn e2e_drop_removes_summaries_and_pruning() {
- e2e_setup_events();
- assert_eq!(
- Spi::get_one::("SELECT table_range_summary_count('events'::regclass::oid)")
- .unwrap()
- .unwrap(),
- 3
- );
- assert!(Spi::get_one::("SELECT table_range_drop('events'::regclass::oid)")
- .unwrap()
- .unwrap());
- e2e_set_pruning(true);
- let plan = e2e_explain("SELECT * FROM events WHERE val >= 250");
- assert!(plan.contains("events_r1"));
- assert!(plan.contains("events_r2"));
- assert!(plan.contains("events_r3"));
-}
-
#[pg_test]
fn e2e_works_on_plain_unpartitioned_table() {
Spi::run(
@@ -181,9 +151,7 @@ fn e2e_works_on_plain_unpartitioned_table() {
INSERT INTO plain_t SELECT g FROM generate_series(0, 99) g;",
)
.unwrap();
- Spi::get_one::("SELECT table_range_create('plain_t'::regclass::oid, ARRAY['val'])")
- .unwrap()
- .unwrap();
+ e2e_build("plain_t", "val");
e2e_set_pruning(true);
assert_eq!(
Spi::get_one::("SELECT count(*)::bigint FROM plain_t WHERE val >= 50")
@@ -200,39 +168,30 @@ fn e2e_works_on_plain_unpartitioned_table() {
}
#[pg_test]
-fn e2e_insert_without_refresh_is_still_correct() {
+fn e2e_insert_keeps_results_correct_via_staleness() {
e2e_setup_events();
e2e_set_pruning(true);
// Sanity: before the insert, val=500 prunes everything (no partition covers it).
assert_eq!(e2e_count_where("val = 500"), 0);
- // Insert a value far outside r1's summarized range and DO NOT refresh.
+ // Insert a value far outside r1's summarized range. aminsert marks r1 stale, so the
+ // new row is still found — no false negative despite a now-stale summary.
Spi::run("INSERT INTO events VALUES (1, 500)").expect("insert");
-
- // The staleness trigger must have disabled pruning for r1, so the new row is
- // still found — no false negative despite a stale summary.
assert_eq!(
e2e_count_where("val = 500"),
1,
"stale summary must not prune away newly inserted matching rows"
);
- // r1 must reappear in the plan (kept due to staleness).
let plan = e2e_explain("SELECT * FROM events WHERE val = 500");
assert!(plan.contains("events_r1"), "r1 kept while stale:\n{plan}");
-
- // After refresh, pruning becomes effective again and the row is still correct.
- Spi::get_one::("SELECT table_range_refresh('events'::regclass::oid)")
- .unwrap()
- .unwrap();
- assert_eq!(e2e_count_where("val = 500"), 1);
}
#[pg_test]
fn e2e_delete_keeps_results_correct() {
e2e_setup_events();
e2e_set_pruning(true);
- // Deleting rows can only shrink a partition's true range; a now-too-wide summary
- // is conservative (safe). Results must stay correct with or without refresh.
+ // Deleting rows can only shrink a partition's true range; a now-too-wide summary is
+ // conservative (safe). Results must stay correct.
Spi::run("DELETE FROM events WHERE region = 2").expect("delete");
for pred in ["val = 150", "val >= 100 AND val < 200", "val < 50"] {
e2e_set_pruning(false);
@@ -260,16 +219,12 @@ fn e2e_large_tree_prunes_to_single_partition() {
))
.unwrap();
}
- Spi::get_one::("SELECT table_range_create('big'::regclass::oid, ARRAY['val'])")
- .unwrap()
- .unwrap();
+ e2e_build("big", "val");
e2e_set_pruning(true);
// Value 1750 lives only in partition p17 (1700..1799).
let plan = e2e_explain_on("big", "val = 1750");
assert!(plan.contains("big_p17"), "p17 kept:\n{plan}");
- assert!(!plan.contains("big_p0\n") && !plan.contains("big_p0 "), "p0 pruned:\n{plan}");
- // Count of "big_p" scan references should be exactly 1 (only p17).
let scans = plan.matches("big_p").count();
assert_eq!(scans, 1, "expected a single surviving partition:\n{plan}");
@@ -284,8 +239,8 @@ fn e2e_large_tree_prunes_to_single_partition() {
#[pg_test]
fn e2e_per_plan_cache_loads_once_regardless_of_partitions() {
- // 64 range partitions; planning a query must load summaries exactly once, not
- // once per partition — this is the observable signature of the per-plan cache.
+ // 64 range partitions; planning a query must load summaries exactly once, not once
+ // per partition — the observable signature of the per-plan cache.
Spi::run(
"DROP TABLE IF EXISTS cache_t CASCADE;
CREATE TABLE cache_t (val bigint) PARTITION BY RANGE (val);",
@@ -300,13 +255,10 @@ fn e2e_per_plan_cache_loads_once_regardless_of_partitions() {
))
.unwrap();
}
- Spi::get_one::("SELECT table_range_create('cache_t'::regclass::oid, ARRAY['val'])")
- .unwrap()
- .unwrap();
+ e2e_build("cache_t", "val");
e2e_set_pruning(true);
Spi::run("SELECT table_range_reset_cache_load_count()").unwrap();
- // One selective query over 64 partitions.
let found = Spi::get_one::("SELECT count(*)::bigint FROM cache_t WHERE val = 3333")
.unwrap()
.unwrap();
@@ -330,7 +282,9 @@ fn postgis_available() -> bool {
#[pg_test]
fn e2e_postgis_extent_pruning() {
// PostGIS is not installed in every test environment (e.g. the pgrx-managed pg18);
- // skip gracefully there. CI installs PostGIS so this runs for real.
+ // skip gracefully there. CI installs PostGIS so this runs for real. Creating the
+ // extension fires our event trigger, which registers the geometry opclass so
+ // CREATE INDEX ... USING table_range (geom) resolves with no manual step.
if !postgis_available() {
return;
}
@@ -346,9 +300,7 @@ fn e2e_postgis_extent_pruning() {
INSERT INTO ev_g SELECT 3, ST_MakePoint(200+x, 200+y) FROM generate_series(0,10) x, generate_series(0,10) y;",
)
.unwrap();
- Spi::get_one::("SELECT table_range_create('ev_g'::regclass::oid, ARRAY['geom'])")
- .unwrap()
- .unwrap();
+ e2e_build("ev_g", "geom");
e2e_set_pruning(true);
// A query box over partition 3's extent prunes partitions 1 and 2.
@@ -391,9 +343,7 @@ fn e2e_range_overlap_pruning() {
INSERT INTO ev_r SELECT 3, int8range(200+g*10, 200+g*10+10) FROM generate_series(0,9) g;",
)
.unwrap();
- Spi::get_one::("SELECT table_range_create('ev_r'::regclass::oid, ARRAY['period'])")
- .unwrap()
- .unwrap();
+ e2e_build("ev_r", "period");
e2e_set_pruning(true);
let plan = e2e_explain_on("ev_r", "period && int8range(250, 260)");
@@ -401,7 +351,6 @@ fn e2e_range_overlap_pruning() {
assert!(!plan.contains("ev_r_1"), "r1 pruned:\n{plan}");
assert!(!plan.contains("ev_r_2"), "r2 pruned:\n{plan}");
- // Correctness on/off for several overlap predicates, including spanning and empty.
let count = |pred: &str| {
Spi::get_one::(&format!("SELECT count(*)::bigint FROM ev_r WHERE {pred}"))
.unwrap()
@@ -409,9 +358,9 @@ fn e2e_range_overlap_pruning() {
};
for pred in [
"period && int8range(250, 260)",
- "period && int8range(95, 105)", // spans r1/r2 boundary
+ "period && int8range(95, 105)", // spans r1/r2 boundary
"period && int8range(1000, 2000)", // matches nothing
- "period && int8range(0, 300)", // matches everything
+ "period && int8range(0, 300)", // matches everything
] {
e2e_set_pruning(false);
let off = count(pred);
@@ -438,7 +387,6 @@ fn e2e_or_pruning() {
assert!(plan_wide.contains("events_r2"));
assert!(plan_wide.contains("events_r3"));
- // Correctness on/off for several OR shapes, including a nested AND inside the OR.
for pred in [
"val < 50 OR val >= 250",
"val = 5 OR val = 295",
@@ -470,7 +418,6 @@ fn e2e_in_list_pruning() {
assert!(!plan2.contains("events_r2"), "r2 pruned:\n{plan2}");
assert!(!plan2.contains("events_r3"), "r3 pruned:\n{plan2}");
- // Correctness on/off across several IN lists (incl. NULL element and no matches).
for pred in [
"val IN (5, 250)",
"val IN (5, 25, 75)",
@@ -486,22 +433,6 @@ fn e2e_in_list_pruning() {
}
}
-fn e2e_explain_on(table: &str, pred: &str) -> String {
- Spi::connect(|client| {
- let q = format!("EXPLAIN (COSTS OFF) SELECT * FROM {table} WHERE {pred}");
- let t = client.select(&q, None, &[])?;
- let mut out = String::new();
- for row in t {
- if let Ok(Some(line)) = row.get::(1) {
- out.push_str(&line);
- out.push('\n');
- }
- }
- Ok::(out)
- })
- .expect("explain")
-}
-
#[pg_test]
fn e2e_timestamptz_pruning() {
Spi::run(
@@ -515,16 +446,13 @@ fn e2e_timestamptz_pruning() {
INSERT INTO ev_ts SELECT 3, timestamptz '2024-03-01' + (g||' days')::interval FROM generate_series(0,27) g;",
)
.unwrap();
- Spi::get_one::("SELECT table_range_create('ev_ts'::regclass::oid, ARRAY['ts'])")
- .unwrap()
- .unwrap();
+ e2e_build("ev_ts", "ts");
e2e_set_pruning(true);
let plan = e2e_explain_on("ev_ts", "ts >= timestamptz '2024-03-01'");
assert!(plan.contains("ev_ts_3"), "march must remain:\n{plan}");
assert!(!plan.contains("ev_ts_1"), "jan pruned:\n{plan}");
assert!(!plan.contains("ev_ts_2"), "feb pruned:\n{plan}");
- // Correctness on/off.
let pred = "ts >= timestamptz '2024-02-15' AND ts < timestamptz '2024-03-10'";
e2e_set_pruning(false);
let off = Spi::get_one::(&format!("SELECT count(*)::bigint FROM ev_ts WHERE {pred}"))
@@ -550,9 +478,7 @@ fn e2e_text_pruning() {
INSERT INTO ev_txt VALUES (3,'watermelon'),(3,'xigua'),(3,'zucchini');",
)
.unwrap();
- Spi::get_one::("SELECT table_range_create('ev_txt'::regclass::oid, ARRAY['name'])")
- .unwrap()
- .unwrap();
+ e2e_build("ev_txt", "name");
e2e_set_pruning(true);
let plan = e2e_explain_on("ev_txt", "name >= 'watermelon'");
assert!(plan.contains("ev_txt_3"), "third must remain:\n{plan}");
@@ -582,9 +508,7 @@ fn e2e_float_pruning() {
INSERT INTO ev_f SELECT 2, 100.0 + g * 1.5 FROM generate_series(0,49) g;",
)
.unwrap();
- Spi::get_one::("SELECT table_range_create('ev_f'::regclass::oid, ARRAY['amt'])")
- .unwrap()
- .unwrap();
+ e2e_build("ev_f", "amt");
e2e_set_pruning(true);
let plan = e2e_explain_on("ev_f", "amt > 120.0");
assert!(plan.contains("ev_f_2"));
@@ -599,18 +523,12 @@ fn e2e_multicolumn_and_semantics() {
CREATE TABLE mc_1 PARTITION OF mc FOR VALUES IN (1);
CREATE TABLE mc_2 PARTITION OF mc FOR VALUES IN (2);
CREATE TABLE mc_3 PARTITION OF mc FOR VALUES IN (3);
- -- a and b ranges per partition:
- -- p1: a[0..99] b[0..99]
- -- p2: a[100..199] b[100..199]
- -- p3: a[200..299] b[200..299]
INSERT INTO mc SELECT 1, g, g FROM generate_series(0,99) g;
INSERT INTO mc SELECT 2, 100+g, 100+g FROM generate_series(0,99) g;
INSERT INTO mc SELECT 3, 200+g, 200+g FROM generate_series(0,99) g;",
)
.unwrap();
- Spi::get_one::("SELECT table_range_create('mc'::regclass::oid, ARRAY['a','b'])")
- .unwrap()
- .unwrap();
+ e2e_build("mc", "a, b");
e2e_set_pruning(true);
// a >= 250 keeps only p3; b < 50 alone keeps only p1; together -> empty.
let plan = e2e_explain_on("mc", "a >= 250 AND b < 50");
@@ -618,7 +536,6 @@ fn e2e_multicolumn_and_semantics() {
assert!(!plan.contains("mc_2"), "p2 pruned by both:\n{plan}");
assert!(!plan.contains("mc_3"), "p3 pruned by b:\n{plan}");
- // Correctness across several multi-column predicates.
for pred in [
"a >= 250 AND b < 50",
"a < 150 AND b > 50",
@@ -645,14 +562,12 @@ fn e2e_is_null_pruning() {
CREATE TABLE nt_nonull PARTITION OF nt FOR VALUES IN (1);
CREATE TABLE nt_allnull PARTITION OF nt FOR VALUES IN (2);
CREATE TABLE nt_mixed PARTITION OF nt FOR VALUES IN (3);
- INSERT INTO nt SELECT 1, g FROM generate_series(1,50) g; -- no nulls
- INSERT INTO nt SELECT 2, NULL FROM generate_series(1,50) g; -- all null
+ INSERT INTO nt SELECT 1, g FROM generate_series(1,50) g;
+ INSERT INTO nt SELECT 2, NULL FROM generate_series(1,50) g;
INSERT INTO nt SELECT 3, CASE WHEN g % 2 = 0 THEN g ELSE NULL END FROM generate_series(1,50) g;",
)
.unwrap();
- Spi::get_one::("SELECT table_range_create('nt'::regclass::oid, ARRAY['val'])")
- .unwrap()
- .unwrap();
+ e2e_build("nt", "val");
e2e_set_pruning(true);
// IS NULL: the no-null partition can be pruned; all-null and mixed remain.
@@ -667,7 +582,6 @@ fn e2e_is_null_pruning() {
assert!(plan_nn.contains("nt_nonull"));
assert!(plan_nn.contains("nt_mixed"));
- // Correctness on/off for both null predicates.
for pred in ["val IS NULL", "val IS NOT NULL"] {
e2e_set_pruning(false);
let off = Spi::get_one::(&format!("SELECT count(*)::bigint FROM nt WHERE {pred}"))
diff --git a/src/index_am.rs b/src/index_am.rs
index cbe9832..8a92a52 100644
--- a/src/index_am.rs
+++ b/src/index_am.rs
@@ -37,8 +37,8 @@ unsafe extern "C-unwind" fn xact_callback(
// Instead, `ambuild` scans the (leaf) relation and writes one min/max/null summary per
// indexed column into `table_range_summary` (keyed by the index OID), and installs
// the staleness trigger that keeps the summary conservative on data changes. The planner
-// hook then prunes partitions exactly as it does for the function interface. The index is
-// never chosen for scans (no `amgettuple`/`amgetbitmap`, prohibitive cost estimate).
+// hook then prunes partitions using those summaries. The index is never chosen for scans
+// (no `amgettuple`/`amgetbitmap`, prohibitive cost estimate).
/// V1 function-info record so PostgreSQL can call `table_range_amhandler` as a
/// `LANGUAGE c` function declared in the access-method SQL below.
@@ -100,7 +100,7 @@ unsafe extern "C-unwind" fn am_build(
pg_sys::PushActiveSnapshot(pg_sys::GetTransactionSnapshot());
pgrx::PgTryBuilder::new(|| {
if let Ok(names) = crate::summary_build::column_names_for_attnums(heap_relid, &attnums) {
- let _ = crate::summary_build::build_one_leaf(index_relid, heap_relid, &names, false);
+ let _ = crate::summary_build::build_one_leaf(index_relid, heap_relid, &names);
}
})
.catch_others(|_| ())
@@ -212,27 +212,69 @@ extension_sql!(
CREATE ACCESS METHOD table_range TYPE INDEX HANDLER table_range_amhandler;
COMMENT ON ACCESS METHOD table_range IS
- 'Conservative early partition pruning via min/max range summaries';
-
- -- Minimal default operator classes so CREATE INDEX ... USING table_range resolves a
- -- class for each common column type. The AM stores only summaries, so these carry no
- -- operators or support procedures (amvalidate accepts them).
- CREATE OPERATOR CLASS bool_tr_ops DEFAULT FOR TYPE boolean USING table_range AS STORAGE boolean;
- CREATE OPERATOR CLASS int2_tr_ops DEFAULT FOR TYPE smallint USING table_range AS STORAGE smallint;
- CREATE OPERATOR CLASS int4_tr_ops DEFAULT FOR TYPE integer USING table_range AS STORAGE integer;
- CREATE OPERATOR CLASS int8_tr_ops DEFAULT FOR TYPE bigint USING table_range AS STORAGE bigint;
- CREATE OPERATOR CLASS float4_tr_ops DEFAULT FOR TYPE real USING table_range AS STORAGE real;
- CREATE OPERATOR CLASS float8_tr_ops DEFAULT FOR TYPE double precision USING table_range AS STORAGE double precision;
- CREATE OPERATOR CLASS numeric_tr_ops DEFAULT FOR TYPE numeric USING table_range AS STORAGE numeric;
- CREATE OPERATOR CLASS text_tr_ops DEFAULT FOR TYPE text USING table_range AS STORAGE text;
- CREATE OPERATOR CLASS varchar_tr_ops DEFAULT FOR TYPE varchar USING table_range AS STORAGE varchar;
- CREATE OPERATOR CLASS bpchar_tr_ops DEFAULT FOR TYPE bpchar USING table_range AS STORAGE bpchar;
- CREATE OPERATOR CLASS date_tr_ops DEFAULT FOR TYPE date USING table_range AS STORAGE date;
- CREATE OPERATOR CLASS time_tr_ops DEFAULT FOR TYPE time USING table_range AS STORAGE time;
- CREATE OPERATOR CLASS timestamp_tr_ops DEFAULT FOR TYPE timestamp USING table_range AS STORAGE timestamp;
- CREATE OPERATOR CLASS timestamptz_tr_ops DEFAULT FOR TYPE timestamptz USING table_range AS STORAGE timestamptz;
- CREATE OPERATOR CLASS uuid_tr_ops DEFAULT FOR TYPE uuid USING table_range AS STORAGE uuid;
- CREATE OPERATOR CLASS oid_tr_ops DEFAULT FOR TYPE oid USING table_range AS STORAGE oid;
+ 'Early partition pruning via per-partition data-range summaries';
+
+ -- CREATE INDEX ... USING table_range needs a default operator class for the column
+ -- type. Rather than hardcode a list, mirror the operator-class coverage that already
+ -- exists: any btree-ordered type (scalar min/max), every range type, and PostGIS
+ -- geometry/geography (extent). The AM stores only summaries, so these classes carry
+ -- no operators or support procedures. This runs at install and again whenever an
+ -- extension is created, so installing PostGIS makes geometry "just work" with no
+ -- manual step.
+ CREATE FUNCTION table_range_sync_opclasses() RETURNS void
+ LANGUAGE plpgsql AS $$
+ DECLARE
+ tr_am oid;
+ r record;
+ BEGIN
+ SELECT oid INTO tr_am FROM pg_am WHERE amname = 'table_range';
+ IF tr_am IS NULL THEN
+ RETURN;
+ END IF;
+ FOR r IN
+ SELECT DISTINCT cand.typid, format_type(cand.typid, NULL) AS typname
+ FROM (
+ SELECT bc.opcintype AS typid
+ FROM pg_opclass bc JOIN pg_am am ON am.oid = bc.opcmethod
+ WHERE am.amname = 'btree' AND bc.opcdefault
+ UNION
+ SELECT t.oid FROM pg_type t WHERE t.typtype = 'r'
+ UNION
+ SELECT t.oid FROM pg_type t WHERE t.typname IN ('geometry', 'geography')
+ ) cand
+ WHERE cand.typid NOT IN ('anyrange'::regtype, 'anyarray'::regtype)
+ AND NOT EXISTS (
+ SELECT 1 FROM pg_opclass tc
+ WHERE tc.opcmethod = tr_am AND tc.opcdefault
+ AND tc.opcintype = cand.typid
+ )
+ LOOP
+ BEGIN
+ EXECUTE format(
+ 'CREATE OPERATOR CLASS %I DEFAULT FOR TYPE %s USING table_range AS STORAGE %s',
+ 'tr_' || r.typid, r.typname, r.typname);
+ EXCEPTION WHEN OTHERS THEN
+ -- Skip types that cannot host a storage-only opclass.
+ NULL;
+ END;
+ END LOOP;
+ END;
+ $$;
+
+ SELECT table_range_sync_opclasses();
+
+ CREATE FUNCTION table_range_opclass_sync_evt() RETURNS event_trigger
+ LANGUAGE plpgsql AS $$
+ BEGIN
+ PERFORM table_range_sync_opclasses();
+ END;
+ $$;
+
+ -- Re-sync when any extension is installed (e.g. PostGIS), so new types that gain a
+ -- btree/geometry opclass automatically become usable with table_range.
+ CREATE EVENT TRIGGER table_range_opclass_sync_trg
+ ON ddl_command_end WHEN TAG IN ('CREATE EXTENSION')
+ EXECUTE FUNCTION table_range_opclass_sync_evt();
"#,
name = "table_range_access_method",
requires = ["table_range_bootstrap_sql"]
diff --git a/src/lib.rs b/src/lib.rs
index 351a8f2..945d75b 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -43,8 +43,8 @@ pub extern "C-unwind" fn _PG_init() {
extension_sql!(
r#"
- -- One summary per (registered relation, leaf partition, column).
- -- index_oid: the registering parent relation OID (or index OID via the AM).
+ -- One summary per (index, leaf partition, column), built by the index AM.
+ -- index_oid: the (leaf) index relation OID.
-- relid: the leaf partition (heap) OID the planner sees.
-- attnum: the leaf partition's attnum for the column (resolved by name).
-- kind: 'minmax' -> min_summary/max_summary hold the column's btree min/max;
@@ -74,29 +74,8 @@ extension_sql!(
ON table_range_summary (relid)
WHERE stale;
- -- Registration of parent relations whose partitions are summarized for pruning.
- CREATE TABLE IF NOT EXISTS table_range_registered (
- parent_relid oid PRIMARY KEY,
- columns text[] NOT NULL,
- created_at timestamptz NOT NULL DEFAULT now(),
- refreshed_at timestamptz NOT NULL DEFAULT now()
- );
-
- -- Row/statement trigger that marks a partition's summaries stale when its data
- -- changes. The planner hook ignores stale summaries (treats them as KEEP), so any
- -- modification automatically and conservatively disables pruning for that partition
- -- until table_range_refresh() recomputes it. This is the correctness safety net:
- -- no INSERT/UPDATE/DELETE/TRUNCATE can ever cause a false negative.
- CREATE OR REPLACE FUNCTION table_range_stale_trigger() RETURNS trigger
- LANGUAGE plpgsql AS $$
- BEGIN
- UPDATE table_range_summary SET stale = true WHERE relid = TG_RELID;
- RETURN NULL;
- END;
- $$;
-
- -- Drop summaries/registration for any relation that is dropped, so a dropped
- -- table_range index can never leave behind a summary that nothing keeps stale.
+ -- Drop summaries for any relation that is dropped, so a dropped table_range index
+ -- (or its table) can never leave behind a summary that nothing keeps stale.
CREATE FUNCTION table_range_drop_cleanup() RETURNS event_trigger
LANGUAGE c AS 'MODULE_PATHNAME', 'table_range_drop_cleanup';
CREATE EVENT TRIGGER table_range_drop_trg ON sql_drop
diff --git a/src/prune_hook.rs b/src/prune_hook.rs
index 715051d..5da946b 100644
--- a/src/prune_hook.rs
+++ b/src/prune_hook.rs
@@ -11,10 +11,12 @@ use crate::{TABLE_RANGE_ENABLE_PRUNING, TABLE_RANGE_LOG_PRUNING_DEBUG};
/// One load per top-level plan (regardless of partition count) proves the per-plan cache.
static CACHE_LOADS: AtomicU64 = AtomicU64::new(0);
+#[cfg(any(test, feature = "pg_test"))]
pub fn cache_load_count() -> u64 {
CACHE_LOADS.load(Ordering::Relaxed)
}
+#[cfg(any(test, feature = "pg_test"))]
pub fn reset_cache_load_count() {
CACHE_LOADS.store(0, Ordering::Relaxed);
}
diff --git a/src/summary_build.rs b/src/summary_build.rs
index ca164f8..5731ec5 100644
--- a/src/summary_build.rs
+++ b/src/summary_build.rs
@@ -2,16 +2,14 @@ use pgrx::prelude::*;
use pgrx::spi::SpiError;
use std::sync::OnceLock;
-// Real, SPI-driven summary maintenance for the table_range pruning extension.
+// SPI-driven summary maintenance for the table_range pruning extension.
//
-// Scans a registered parent relation's leaf partitions and persists one
-// per-(partition, column) min/max/null summary into `table_range_summary`.
-//
-// Summaries are keyed by:
-// - `index_oid` = the registered parent relation OID (synthetic key, no real index),
-// - `relid` = the leaf partition OID,
-// - `attnum` = the leaf partition's attnum for the column (resolved by name so
-// differing physical column order across partitions is handled).
+// Summaries are built by the index access method's `ambuild` (see `index_am.rs`): for
+// each leaf partition it scans the column's real data and persists one summary row into
+// `table_range_summary`, keyed by:
+// - `index_oid` = the (leaf) index relation OID,
+// - `relid` = the leaf partition (heap) OID the planner sees,
+// - `attnum` = the leaf partition's attnum for the column.
//
// Correctness: a missing or `stale` summary means "do not prune". We never persist a
// summary that could cause a false negative; on any failure we leave the partition
@@ -42,99 +40,6 @@ pub(crate) fn summary_table() -> String {
format!("{}.table_range_summary", schema())
}
-/// Schema-qualified name of the registration table.
-pub(crate) fn registered_table() -> String {
- format!("{}.table_range_registered", schema())
-}
-
-/// Register a parent relation and build summaries for the named columns.
-#[pg_extern]
-fn table_range_create(parent: pg_sys::Oid, columns: Vec) -> i64 {
- if columns.is_empty() {
- error!("table_range_create: at least one column is required");
- }
- validate_columns_exist(parent, &columns);
-
- // Persist registration (idempotent).
- let cols_literal = pg_array_text_literal(&columns);
- let reg = format!(
- "INSERT INTO table_range_registered (parent_relid, columns, refreshed_at) \
- VALUES ({}::oid, {}, now()) \
- ON CONFLICT (parent_relid) DO UPDATE SET columns = EXCLUDED.columns, refreshed_at = now()",
- oid_u32(parent),
- cols_literal
- );
- Spi::run(®).unwrap_or_else(|e| error!("table_range_create: failed to register parent: {e}"));
-
- build_summaries(parent, &columns)
- .unwrap_or_else(|e| error!("table_range_create: summary build failed: {e}"))
-}
-
-/// Recompute summaries for an already-registered parent relation.
-#[pg_extern]
-fn table_range_refresh(parent: pg_sys::Oid) -> i64 {
- let columns = registered_columns(parent)
- .unwrap_or_else(|e| error!("table_range_refresh: lookup failed: {e}"));
- let columns = match columns {
- Some(c) => c,
- None => error!(
- "table_range_refresh: parent {} is not registered",
- oid_u32(parent)
- ),
- };
- let written = build_summaries(parent, &columns)
- .unwrap_or_else(|e| error!("table_range_refresh: summary build failed: {e}"));
- Spi::run(&format!(
- "UPDATE table_range_registered SET refreshed_at = now() WHERE parent_relid = {}::oid",
- oid_u32(parent)
- ))
- .ok();
- written
-}
-
-/// Unregister a parent relation, drop its summaries, and remove its triggers.
-#[pg_extern]
-fn table_range_drop(parent: pg_sys::Oid) -> bool {
- // Remove staleness triggers from every leaf first (best-effort).
- if let Ok(leaves) = leaf_partitions(parent) {
- for leaf in leaves {
- if let Ok(Some(name)) = relation_name(leaf) {
- let _ = Spi::run(&format!(
- "DROP TRIGGER IF EXISTS {trg} ON {tbl}; \
- DROP TRIGGER IF EXISTS {trg}_trunc ON {tbl}",
- trg = STALE_TRIGGER_NAME,
- tbl = name
- ));
- }
- }
- }
-
- let p = oid_u32(parent);
- Spi::run(&format!(
- "DELETE FROM {tbl} WHERE index_oid = {p}::oid",
- tbl = summary_table()
- ))
- .and_then(|_| {
- Spi::run(&format!(
- "DELETE FROM table_range_registered WHERE parent_relid = {p}::oid"
- ))
- })
- .is_ok()
-}
-
-/// Number of leaf partitions currently summarized for a parent.
-#[pg_extern]
-fn table_range_summary_count(parent: pg_sys::Oid) -> i64 {
- Spi::get_one::(&format!(
- "SELECT count(DISTINCT relid)::bigint FROM {tbl} WHERE index_oid = {p}::oid",
- tbl = summary_table(),
- p = oid_u32(parent)
- ))
- .ok()
- .flatten()
- .unwrap_or(0)
-}
-
/// V1 record for the `sql_drop` event-trigger cleanup function.
#[no_mangle]
pub extern "C" fn pg_finfo_table_range_drop_cleanup() -> &'static pg_sys::Pg_finfo_record {
@@ -142,8 +47,8 @@ pub extern "C" fn pg_finfo_table_range_drop_cleanup() -> &'static pg_sys::Pg_fin
&V1_API
}
-/// Event-trigger handler: when any relation is dropped, remove the summaries and
-/// registration that referenced it. This closes the correctness gap where a dropped
+/// Event-trigger handler: when any relation is dropped, remove the summaries that
+/// referenced it (by index OID or leaf OID). This closes the gap where a dropped
/// `table_range` index would leave summaries behind that nothing keeps stale anymore.
#[no_mangle]
#[pg_guard]
@@ -156,138 +61,23 @@ pub unsafe extern "C-unwind" fn table_range_drop_cleanup(
WHERE s.index_oid = d.objid OR s.relid = d.objid",
tbl = summary_table()
));
- let _ = Spi::run(&format!(
- "DELETE FROM {reg} r USING pg_event_trigger_dropped_objects() d \
- WHERE r.parent_relid = d.objid",
- reg = registered_table()
- ));
})
.catch_others(|_| ())
.execute();
pg_sys::Datum::from(0)
}
-fn validate_columns_exist(parent: pg_sys::Oid, columns: &[String]) {
- for col in columns {
- let found = Spi::get_one::(&format!(
- "SELECT EXISTS (SELECT 1 FROM pg_attribute \
- WHERE attrelid = {}::oid AND attname = {} AND attnum > 0 AND NOT attisdropped)",
- oid_u32(parent),
- quote_literal(col)
- ))
- .ok()
- .flatten()
- .unwrap_or(false);
- if !found {
- error!(
- "table_range_create: column {:?} does not exist on relation {}",
- col,
- oid_u32(parent)
- );
- }
- }
-}
-
-fn registered_columns(parent: pg_sys::Oid) -> Result