Index-resident, incrementally-maintained summaries (BRIN-style); remove side table + REINDEX by bitner · Pull Request #2 · bitner/pg_table_range

bitner · 2026-06-22T22:30:54Z

Replaces the side-table + mark-stale + REINDEX maintenance model with self-maintaining, index-resident summaries — the way BRIN works. Follow-up to #1.

What changed

Summaries live in the index's own metapage (src/index_storage.rs): page I/O via the Generic WAL API + a compact versioned (de)serialization. No side table.
ambuild scans the leaf and writes the summary to the metapage.
The planner reads each partition's summary directly from its index page (cached per plan), instead of an SPI load of a side table.
aminsert widens the summary in place as rows arrive — like BRIN. Because the summary only ever has to be over-inclusive, these updates need no MVCC: an insert within range writes nothing; one that extends it grows min/max (scalars, in-memory) or the extent (range/geometry, via the type's union). Pruning stays correct and active across inserts with no REINDEX.
Removed: the table_range_summary table, the stale flag + per-transaction memo, and the sql_drop cleanup event trigger (DROP INDEX frees the summary with the index). Deletes leave the summary conservatively wide (safe); VACUUM/REINDEX re-tighten.

Load-bearing dependency discovered

The planner only puts indisvalid indexes into rel->indexlist, which is where the new read path looks. So a table_range index must stay valid for pruning to engage. This never mattered on main (it read a side table by relid) but matters now; it's documented in read_index_summary. (Surfaced while benchmarking against a stale dev DB carrying a pre-main "hide indexes" trigger that marked indexes invalid.)

Planning-cost optimization

The pathlist hook runs once per partition and was re-resolving the btree compare proc (3 syscache lookups) and re-parsing the query constant for every partition. Both are identical across a column's partitions, so they're now memoized per top-level plan. Warm planning for a non-key predicate at 2,000 partitions dropped ~139 ms → ~80 ms (our per-partition overhead roughly halved).

Benchmarks (warm, `EXPLAIN ANALYZE`, pg18)

vs the old side-table design (main) — no regression. Reading each partition's summary from its index page is as cheap as main's batched side-table load; planning within ~1–3%, execution identical. Pruning's win remains execution: e.g. ~0.4 ms vs ~100 ms at 300×8k-row partitions.

vs native declarative pruning (two identical columns: pk = partition key, nk = same values, non-key + our index):

Partitions	Native (pk) planning	Ours (nk) planning
2,000	0.15 ms	~80 ms (post-optimization)
10,000	0.29 ms	`ERROR: out of shared memory`

Scaling limitations (documented honestly)

Native pruning is ~constant-time because it prunes on the partition key's sorted bounds before partitions are locked/opened. A non-key predicate forces PG to expand and lock every partition, which is O(n) and exhausts the lock table around ~10k partitions (raise max_locks_per_transaction to push that out). This is inherent to PG's planner — non-key pruning can't be made sub-linear from public hooks. table_range targets the hundreds-to-low-thousands of sizeable partitions, non-key-predicate sweet spot, where it wins big on execution. True tens-of-thousands scaling would need pre-expansion pruning (a core-PG hook), tracked separately.

Tests

29 tests pass on pg18 (insert-correctness tests verify out-of-range inserts widen the page summary so new rows are found and pruning stays active). Production build, clippy -D warnings, and fmt all clean. New tests cover the raw page round-trip and that ambuild persists the summary.

First step toward BRIN-style, in-index incremental summaries (no side table, no stale flag, no REINDEX). Adds index_storage.rs: write/read a length-prefixed byte blob in the index's own metapage (block 0), updated in place and WAL-logged via the Generic WAL API. Because a table_range summary only needs to be over-inclusive, these page updates need no MVCC/transactionality. Round-trip is proven by a pg_test (caught and fixed the Generic-WAL page-hole zeroing by setting pd_lower = pd_upper). Existing 26 tests unaffected. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Adds the IndexSummary/ColSummary types (one entry per indexed column: attnum, minmax-vs-overlap kind, type name, min/max text, null flags) and a compact, versioned byte format persisted into the index metapage via the page-I/O layer. Pure-Rust round-trip + bad-input tests pass on the host. This is the shared currency for the next stages: ambuild builds an IndexSummary and writes it; the planner reads it per partition; aminsert reads/widens/writes it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… (BRIN-style) Replace the side table + mark-stale + REINDEX model with self-maintaining, index-resident summaries: - ambuild now writes the per-partition summary to the index's own metapage (index_storage.rs), keyed by nothing but the index itself. - The planner reads each partition's summary from its index page (cached per plan) instead of an SPI load of a side table. - aminsert widens the metapage summary in place as rows arrive — no MVCC needed because the summary only has to be over-inclusive. An insert within range writes nothing; one that extends it grows min/max (scalars, in memory) or the extent (range/geometry, via the type's union). Pruning stays correct AND active across inserts with no REINDEX. - Remove the table_range_summary side table, the stale flag + per-txn memo, and the sql_drop cleanup event trigger (DROP INDEX frees the summary with the index). Deletes leave the summary conservatively wide (safe); VACUUM/REINDEX re-tighten. 29 tests pass on pg18; production build, clippy -D warnings, and fmt all clean. README and module docs updated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…on build) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The planner only puts valid indexes into rel->indexlist, which is where the pathlist hook reads the per-partition summary from. Note this load-bearing dependency so a future change (or an external DDL hook) that invalidates the index doesn't silently disable pruning. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The pathlist hook runs once per partition, and for a column predicate it was re-resolving the btree compare proc (three syscache lookups) and re-parsing the query constant for every partition. Both are identical across all partitions of a column, so memoize them per top-level plan (cleared in clear_cache). Cuts our per-partition planning overhead roughly in half: at 2000 partitions, warm planning for a non-key-column predicate drops from ~139ms to ~80ms. This does not change the O(partitions) scaling (PG still expands every partition for a non-key predicate) but materially widens the range of partition counts where pruning's execution win outweighs its planning cost. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

pg_sys::TupleDescAttr is only bound on PG18 (where it became an inline C function); on PG13-17 it is a macro bindgen does not surface, and PG18 also moved attributes to compact_attrs. Add a version-gated att_typid() helper: TupleDescAttr on pg18, direct .attrs access on earlier versions. Fixes the pg16/pg17/postgis CI build failures (only pg18 was exercised locally). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

bitner and others added 7 commits June 22, 2026 17:07

Gate test-only prelude import in index_storage (warning-free producti…

65a1b01

…on build) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

bitner merged commit 5e26e77 into main Jun 23, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Index-resident, incrementally-maintained summaries (BRIN-style); remove side table + REINDEX#2

Index-resident, incrementally-maintained summaries (BRIN-style); remove side table + REINDEX#2
bitner merged 7 commits into
mainfrom
summary-perf

bitner commented Jun 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bitner commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed

Load-bearing dependency discovered

Planning-cost optimization

Benchmarks (warm, EXPLAIN ANALYZE, pg18)

Scaling limitations (documented honestly)

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bitner commented Jun 22, 2026 •

edited

Loading

Benchmarks (warm, `EXPLAIN ANALYZE`, pg18)