Match CHECK constraint exclusion planning speed (per-plan compilation + backend summary cache) by bitner · Pull Request #3 · bitner/pg_table_range

bitner · 2026-06-23T15:48:56Z

Goal of this branch: make non-key pruning as cheap to plan as PostgreSQL's own constraint exclusion. Done — table_range now plans on par with (warm, slightly faster than) CHECK constraint exclusion, via two optimizations grounded in a review of the Postgres source.

How constraint exclusion works (the target)

relation_excluded_by_constraints → get_relation_constraints (plancat.c) reads each partition's CHECK from the relcache (rd_att->constr->check) via table_open(relid, NoLock) — the partition is already locked and cached, so it does zero extra I/O per partition — then proves contradiction with predicate_refuted_by (predtest.c) on in-memory expression trees. Per-partition cost ~5 µs.

Where our time went (measured)

A diagnostic attribution at 2,000 partitions split our original ~31 µs/partition: the index-page read was only ~7 µs; evaluation was ~23 µs, dominated by work identical across all partitions but redone per partition — btree_strategy catalog lookups, getTypeInputInfo/fmgr_info setup inside every parse/compare, and constant rendering.

Two optimizations

1. Per-plan compilation (commit 1). Resolve the compare function, type-input function, and operator strategy once per plan (cached FmgrInfos + memos), reused across partitions. Planner-only; the aminsert path keeps the uncached helpers. → ~88 ms → ~43 ms; eval becomes effectively free.

2. Backend summary cache (commit 2, src/summary_cache.rs). Cache each index's deserialized summary for the life of the backend, so warm/repeated plans skip the per-partition index open + metapage read + deserialize entirely (shared via Rc with the per-plan cache). Coherence rests on the over-inclusive invariant: a cached summary is safe unless it is narrower than reality. Only aminsert widens; when it does it sends a relcache invalidation, and a registered callback drops the cached copy everywhere — locally at the next command boundary, in other backends at the widening transaction's commit (matching row visibility). Narrowing operations (delete, vacuum re-tighten) need no invalidation. Widen-invalidations are coalesced to one per index per transaction so bulk loads don't thrash. → ~43 ms → ~34 ms.

Result (2,000 partitions, warm, same session)

	Planning
`CHECK` constraint exclusion	~37 ms
table_range before this branch	~84 ms
table_range after	~34 ms
no pruning (baseline expansion)	~26 ms

Per-partition planning cost fell from ~31 µs → ~3–4 µs — on par with constraint exclusion, and warm we actually beat it (we serve a cached summary; constraint exclusion re-parses each CHECK every plan). At 300 partitions, pruning-on planning (~4 ms) is now ~equal to pruning-off. A cold first plan still reads each page; every plan after is cached.

29 tests pass (including the insert-correctness tests, which exercise widen→invalidate); production build, clippy, and fmt clean. README performance/scaling sections and bench/benchmark.sql updated, including the CHECK constraint-exclusion comparison.

Tradeoff (documented)

The cache benefits read-mostly/repeated planning. For append-heavy workloads where nearly every insert widens the summary, the per-widen relcache invalidation is real cost (coalesced per transaction). Correctness is unconditional; the perf benefit is workload-dependent.

… constraint exclusion Investigation into why table_range planning is slower than PostgreSQL's own constraint exclusion (the built-in way to prune on a non-key column via a data-range CHECK per partition). How constraint exclusion works (src/backend/optimizer/util/plancat.c): relation_excluded_by_constraints -> get_relation_constraints reads each partition's CHECK expressions straight from the relcache (relation->rd_att->constr->check) via table_open(..., NoLock) -- the partition is already locked and cached from planning, so it does zero extra I/O and zero extra locking per partition, then predicate_refuted_by proves contradiction on the in-memory expression trees. Attribution of our ~31us/partition overhead at 2,000 partitions (via a temporary diagnostic) was surprising: the index-page read+deserialize is only ~7us; the *evaluation* was ~23us -- dominated by work that is identical across every partition but was being redone for each one: btree_strategy (3 syscache lookups), getTypeInputInfo and fmgr_info setup inside every text_to_datum / OidFunctionCall, and constant rendering. Fix: resolve those once per top-level plan and reuse across partitions -- - FMGR_MEMO: a palloc'd FmgrInfo per function, so each compare / input-function call skips fmgr_info's syscache lookup (FunctionCall2Coll / InputFunctionCall); - INPUT_INFO_MEMO: getTypeInputInfo result per type; - STRATEGY_MEMO: btree strategy per (operator, left type). These caches are planner-only and cleared per plan; the aminsert path keeps using the uncached datum_cmp / text_to_datum (no cross-statement cache to invalidate). Result at 2,000 partitions (warm, same session): full planning ~88ms -> ~43ms; eval is now effectively free (full ~= read-only ~= traversal-only). Versus CHECK constraint exclusion (~33ms) we went from ~2.6x to ~1.3x; the residual ~5us/part is the per-partition index-page read+deserialize. 29 tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…traint exclusion Backend-lifetime cache (src/summary_cache.rs) keyed by index OID, so warm/repeated plans skip the per-partition index open + metapage read + deserialize entirely and serve the summary from memory (shared via Rc with the per-plan cache). Coherence rests on the over-inclusive invariant: a cached summary is safe as long as it is never *narrower* than the data. Only aminsert widens a summary; when it does it calls CacheInvalidateRelcacheByRelid on the index, and a registered relcache callback drops the cached copy in every backend (locally at the next command boundary, in other backends at the widening txn's commit -- matching row visibility). Operations that only narrow (delete, vacuum re-tighten) need no invalidation. Widen-invalidations are coalesced to one per index per transaction so bulk loads don't thrash. Result at 2,000 partitions (warm, same session): planning ~43ms -> ~34ms, which now matches -- and slightly beats -- CHECK constraint exclusion (~37ms), because we serve a cached summary while constraint exclusion re-parses each CHECK every plan. At 300 partitions, pruning-on planning (~4ms) is now ~equal to pruning-off. Combined with the earlier per-plan compilation, per-partition planning cost fell from ~31us to ~3-4us. A cold first plan still reads each page; every plan after is cached. 29 tests pass. README performance/scaling sections and benchmark numbers updated accordingly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

bitner and others added 2 commits June 23, 2026 10:48

bitner changed the title ~~Compile predicate evaluation per plan (~2x faster planning; close gap to CHECK exclusion)~~ Match CHECK constraint exclusion planning speed (per-plan compilation + backend summary cache) Jun 23, 2026

bitner merged commit 5210c50 into main Jun 23, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Match CHECK constraint exclusion planning speed (per-plan compilation + backend summary cache)#3

Match CHECK constraint exclusion planning speed (per-plan compilation + backend summary cache)#3
bitner merged 2 commits into
mainfrom
prune-perf

bitner commented Jun 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bitner commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How constraint exclusion works (the target)

Where our time went (measured)

Two optimizations

Result (2,000 partitions, warm, same session)

Tradeoff (documented)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bitner commented Jun 23, 2026 •

edited

Loading