Partial Indexes for Soft Deletes: Eliminating Filter Overhead #
When a table accumulates a high fraction of soft-deleted rows, standard B-tree indexes grow to include entries that active-record queries will never return. The execution plan then shows a Filter: (is_deleted = false) node sitting after the index scan — meaning PostgreSQL traverses and fetches heap pages for deleted rows only to discard them immediately. This page shows how to confirm that specific pathology in EXPLAIN (ANALYZE, BUFFERS) output and resolve it with a targeted partial index. For the broader mechanics of when PostgreSQL chooses partial indexes over full ones, see the partial index implementation guide.
Why a Full Index Becomes Inefficient Under Soft Deletes #
The cost model PostgreSQL uses to choose an access path is anchored in row counts. When is_deleted = true rows represent 70 % or more of a table, the planner estimates that an index scan will return many rows matching the indexed columns — but most of those rows are eliminated post-scan by the soft-delete filter. The cost estimation model charges for every heap page fetch, so the plan accumulates I/O cost even for rows it immediately discards.
The resulting bloat shows three concrete signals in EXPLAIN (ANALYZE, BUFFERS) output:
Rows Removed by Filteris large relative to actual rows returned.shared_blks_readis elevated because deleted rows occupy heap pages that must enter the buffer cache before being dismissed.- The
Filternode appears below the index scan node in the plan tree, confirming the predicate is applied after — not during — the index lookup.
This pattern is distinct from a sequential scan chosen over an index scan, where the planner makes a deliberate cost-based decision to skip the index entirely. Here the index is used — it is simply the wrong index.
Annotated EXPLAIN Evidence #
Run EXPLAIN (ANALYZE, BUFFERS) against the target query before creating any new index:
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, status, created_at
FROM orders
WHERE status = 'processing'
AND is_deleted = false;
A typical pre-fix plan on a table with 80 % deleted rows:
Index Scan using idx_orders_status on orders
(cost=0.56..4821.33 rows=312 width=24)
(actual time=0.041..38.7 rows=310 loops=1)
Index Cond: ((status)::text = 'processing') -- ① predicate the index enforces
Filter: (is_deleted = false) -- ② applied AFTER the index scan
Rows Removed by Filter: 1247 -- ③ 1247 heap fetches wasted
Buffers: shared hit=89 read=342 -- ④ 342 blocks read from disk
Planning Time: 0.3 ms
Execution Time: 39.1 ms
The diagnostic fields to focus on:
- ① The
Index Condcovers onlystatus. The soft-delete predicate is not part of the index condition. - ② The standalone
Filternode is the symptom to eliminate. - ③
Rows Removed by Filter: 1247means 80 % of fetched rows were discarded. This ratio should track your table’s deleted-row ratio. - ④
shared_blks_readat 342 blocks includes pages belonging to deleted rows. Reducing that number is the goal.
If Rows Removed by Filter is less than 10 % of scanned rows, a partial index will provide marginal benefit. If it is above 30 %, the overhead is significant enough to justify the change.
Step-by-Step Resolution Workflow #
1. Record the baseline metrics #
Run the target query with full instrumentation and note three numbers: actual time, Rows Removed by Filter, and shared_blks_read. These are the before values you will compare against.
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, status, created_at
FROM orders
WHERE status = 'processing'
AND is_deleted = false;
2. Inspect the existing index via the catalog #
Confirm which index the planner is using and check its effective size:
SELECT
indexname,
idx_scan,
idx_tup_read,
idx_tup_fetch,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE tablename = 'orders'
ORDER BY idx_tup_read DESC;
A large gap between idx_tup_read (entries traversed) and idx_tup_fetch (heap rows actually used) signals that many index entries are being read but discarded — exactly the soft-delete pattern.
3. Create the partial index with CONCURRENTLY #
The index WHERE clause must match the query predicate exactly — same operator, same constant, same column type:
CREATE INDEX CONCURRENTLY idx_orders_active_processing
ON orders (status, created_at DESC)
WHERE is_deleted = false;
CONCURRENTLY avoids an exclusive write lock during index build. The build takes longer but allows normal INSERT/UPDATE/DELETE operations to proceed. On a busy table, prefer this to a blocking CREATE INDEX.
4. Refresh planner statistics #
ANALYZE orders;
Until statistics are refreshed, the planner may underestimate the selectivity of the new index and continue using the old one. If the table is large, you can target just the relevant columns:
ANALYZE orders (status, created_at, is_deleted);
5. Verify the planner switches to the new index #
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, status, created_at
FROM orders
WHERE status = 'processing'
AND is_deleted = false;
Confirm: the Filter: (is_deleted = false) node is absent. The Index Cond should encode only status, with the partial predicate implicitly satisfied by the index definition. Rows Removed by Filter should be zero or absent.
6. Drop the superseded full index #
Once the new index is confirmed in use, remove the old one to eliminate its write amplification on every INSERT, UPDATE, and DELETE:
DROP INDEX CONCURRENTLY idx_orders_status;
Before / After Plan Comparison #
The two lines that matter most:
| Metric | Before (full index) | After (partial index) |
|---|---|---|
Rows Removed by Filter |
1247 | 0 |
shared_blks_read |
342 | 68 |
The shared_blks_read drop is proportional to the fraction of deleted rows in the table. An 80 % deletion ratio produces roughly an 80 % reduction in buffer reads for active-record queries. This directly reduces cache pressure across all concurrent queries sharing the buffer pool.
Common Pitfalls #
Predicate mismatch prevents index use. The index WHERE is_deleted = false will not be chosen for a query with WHERE NOT is_deleted or WHERE is_deleted IS NOT TRUE. PostgreSQL requires a literal predicate match — rewrite the query or create a separate index for each predicate form.
ORM-generated predicates bypass the exact match. Some ORM query builders inject OR is_deleted IS NULL to handle nullable columns defensively. That compound predicate does not satisfy WHERE is_deleted = false, so the planner falls back to the full index. Inspect the raw SQL your ORM emits with query logging before assuming the index will be used.
Stale statistics after creation. Creating the index alone is not sufficient — ANALYZE must run before the planner knows the index exists in updated form. On tables with autovacuum delays or large recent inserts, trigger a manual ANALYZE as part of the deployment.
Forgetting to drop the old index doubles write amplification. Both indexes must be maintained on every write. Until the old full index is dropped, you gain the read benefit of the partial index while still paying the write cost of the full one. Always confirm the new index is in use before issuing the DROP.
Frequently Asked Questions #
Why does the query planner ignore my partial index after creation? #
The planner bypasses a partial index when statistics are stale or when the query predicate does not exactly match the index WHERE clause. Run ANALYZE on the table immediately after creation. Also verify the query’s is_deleted = false condition matches the index definition verbatim — casting or NULL handling differences can prevent the match.
How do I handle queries that need both active and recently deleted records? #
A partial index only covers rows that satisfy its WHERE predicate. For queries spanning multiple deletion states, keep a separate full index alongside the partial one. PostgreSQL will choose the cheaper path per query — the full index for mixed-state queries, the partial index for active-only queries.
Does a partial index reduce VACUUM overhead? #
Yes. Excluding soft-deleted rows from the index means VACUUM processes fewer index entries during dead-tuple cleanup. The heap still requires standard VACUUM cycles to reclaim space. Monitor n_dead_tup in pg_stat_user_tables to track heap-level churn separately from index churn.
Related #
- Partial Index Implementation — parent: when and how PostgreSQL uses partial index predicates in the cost model
- Building Effective Covering Indexes — combine
INCLUDEcolumns with a partial predicate for index-only scans on active rows - Index Tuning & Strategy — the full index access path taxonomy: B-tree, partial, covering, and specialised index types