Why Index Scans Sometimes Underperform #
An index scan that looked fast in development can degrade into a multi-second bottleneck in production — even when the index is correct and the query returns a modest number of rows. The failure mode is specific: the index amplifies random I/O rather than reducing it. This page isolates the exact planner conditions and heap-access patterns that cause this, and provides a step-by-step path to resolution. For the full decision logic between scan strategies, see the parent page on sequential vs index scans.
The Condition That Triggers Degradation #
PostgreSQL’s cost estimation model assigns each plan node a cost based on expected I/O operations. For an index scan, the planner multiplies the estimated number of matching rows by random_page_cost (default: 4.0) — one random heap fetch per row. For a sequential scan, it multiplies page count by seq_page_cost (default: 1.0).
The break-even point occurs when:
(index_rows × random_page_cost) > (heap_pages × seq_page_cost)
In practice this tips at roughly 10–20% of the table’s rows. Three conditions collapse that threshold even further:
Low heap correlation. The correlation column in pg_stats measures how well the physical order of heap pages matches the index order (range: −1 to 1, where 1 is perfect). A created_at column on an append-only table typically holds correlation near 1.0 — index entries point to pages in the same order they were written. A status column updated randomly across millions of rows may hold correlation near 0.0, meaning consecutive index entries scatter across unrelated heap pages. Each fetch is a fresh random read.
Poor secondary predicate selectivity. When a query carries two predicates but only one is covered by the index, the planner uses the index to satisfy the first predicate and then applies the second as a heap-level filter. Every row that fails the second predicate represents a wasted random heap fetch.
Stale statistics. If pg_statistic was not refreshed after bulk loads or schema changes, the planner’s row estimate can be off by orders of magnitude. It selects an index scan expecting to fetch 200 rows, but 85,000 rows match the index condition at runtime.
Annotated EXPLAIN Evidence #
The following query has a composite predicate but only a single-column index on created_at:
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders
WHERE created_at > '2024-01-01'
AND status = 'pending';
Degraded plan output:
Index Scan using idx_orders_created on orders
(cost=0.43..1250.12 rows=800 width=128) -- planner expects 800 rows
(actual time=0.08..4812.33 loops=1) -- actual: 4.8 seconds
Index Cond: (created_at > '2024-01-01'::timestamp)
Filter: (status = 'pending'::text)
Rows Removed by Filter: 85000 -- 85 k heap fetches discarded
Buffers: shared hit=245 read=12500 -- 12 500 random physical reads
Three signals confirm the failure:
- Row estimate gap: planner predicted 800 rows; 85,800 heap rows were actually visited (the
Rows Removed by Filterplus the returned rows). - Buffers: shared read=12500: 12,500 pages had to be read from disk or OS cache cold — not from the shared buffer pool. That is the random I/O amplifier.
- actual time vs cost mismatch: the planned cost (1250) maps to a fraction of the real runtime. The cost model failed because
random_page_cost = 4.0did not capture true I/O latency at this scale.
Check heap correlation directly #
SELECT attname, correlation
FROM pg_stats
WHERE tablename = 'orders'
AND attname IN ('created_at', 'status');
A correlation below 0.3 on the indexed column is a direct predictor of scattered reads. If status has low correlation and you index on it alone, every index lookup points to a different heap page.
Step-by-Step Resolution Workflow #
Step 1: Refresh statistics.
ANALYZE orders;
This rebuilds histograms and most-common-value lists in pg_statistic. Rerun EXPLAIN (ANALYZE, BUFFERS) immediately after — if the row estimate corrects, the planner may switch to a sequential scan automatically, which can be the right outcome for high-selectivity predicates.
Step 2: Build a covering index to eliminate heap fetches.
CREATE INDEX idx_orders_covering
ON orders (created_at, status)
INCLUDE (customer_id, total_amount);
The composite key (created_at, status) satisfies both predicates at the index level. The INCLUDE columns let the query return customer_id and total_amount without touching the heap at all — producing an Index Only Scan. See covering index design for the full column-ordering strategy.
Step 3: Apply a partial index for the high-activity subset.
If status = 'pending' is a small, active fraction of the table, a partial index scopes the B-tree to only those rows:
CREATE INDEX idx_orders_pending_created
ON orders (created_at)
WHERE status = 'pending';
A smaller index fits in fewer buffer pages, improves cache hit rates, and lowers maintenance cost on every INSERT and UPDATE.
Step 4: Tune random_page_cost to match your storage.
-- NVMe / all-flash arrays: random reads are nearly as fast as sequential
SET LOCAL random_page_cost = 1.1;
-- Verify the plan changes
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders
WHERE created_at > '2024-01-01' AND status = 'pending';
Lowering random_page_cost makes the planner willing to choose index scans where the default 4.0 would bias it toward sequential scans. Set this in postgresql.conf for a persistent system-wide change, or per-session with SET LOCAL during testing.
Step 5: Disable index scans temporarily to measure the sequential baseline.
SET LOCAL enable_indexscan = off;
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders
WHERE created_at > '2024-01-01' AND status = 'pending';
Compare actual time between the forced sequential scan and the index scan. If the sequential scan is faster, you have confirmed the random I/O problem and can measure the exact improvement after adding the covering index.
Before / After Plan Comparison #
After adding idx_orders_covering and rerunning ANALYZE:
-- Before: single-column index, heap fetches required
Index Scan actual time=0.08..4812.33 Buffers: shared read=12500
-- After: covering index, heap fetch eliminated
Index Only Scan actual time=0.04..18.72 Heap Fetches: 0 Buffers: shared hit=312
The Buffers: shared read collapses to zero. The Heap Fetches: 0 line confirms the query is served entirely from the index pages, which now fit in the shared buffer pool.
Diagram: Index Scan vs Index Only Scan I/O Paths #
Common Pitfalls #
Assuming an index always helps for selective predicates. An index with 5% selectivity can still generate thousands of random reads if the table is large and heap correlation is low. Confirm with pg_stats.correlation before concluding an index will be net positive.
Ignoring Rows Removed by Filter in the plan output. This field counts rows fetched from the heap that were then discarded by a second predicate. Each one represents a wasted random read. A large value is the clearest signal that the index covers the wrong column combination.
Setting random_page_cost globally without validating storage. Reducing it on a system backed by spinning disks will cause the planner to choose index scans that genuinely are slower than sequential scans. Measure actual random vs sequential I/O latency with pg_test_timing or storage benchmarks before adjusting.
Skipping ANALYZE after bulk loads. After inserting or deleting millions of rows, pg_statistic reflects the old distribution. The planner may choose an index scan based on a row estimate that is wrong by two orders of magnitude. Always run ANALYZE immediately after any bulk operation that changes more than 10% of a table.
Frequently Asked Questions #
At what row return percentage does an index scan typically become slower than a sequential scan? #
Generally when an index scan retrieves more than 10–20% of table rows, cumulative random I/O cost exceeds sequential read throughput. This threshold shifts lower for tables with poor heap correlation and higher for NVMe storage where random reads are cheaper. Check pg_stats.correlation — a value below 0.3 is a reliable signal that the threshold is significantly lower than 20%.
How can I force PostgreSQL to skip an index scan for testing? #
Use SET LOCAL enable_indexscan = off; within a transaction to disable index scans for that session. This lets you measure the sequential scan baseline without creating or dropping indexes, and gives you a clean before/after comparison for actual time and Buffers metrics.
Why does an index scan show high actual time but low planned cost in EXPLAIN? #
The gap indicates the cost estimator underestimated physical I/O — typically caused by stale statistics, an incorrect random_page_cost setting, low pg_stats.correlation, or a secondary filter predicate that the index cannot satisfy. Run ANALYZE, review the correlation value, and compare random_page_cost to your actual storage latency profile.
Related #
- Sequential vs Index Scans — parent page covering the full scan-selection decision model
- Covering Index Design — eliminate heap fetches by including all required columns in the index
- Partial Index Implementation — scope an index to the active data subset to improve cache hit rates
- How PostgreSQL Calculates Node Costs — understand how
random_page_costandseq_page_costfeed into plan selection - Execution Plan Fundamentals — grandparent section covering the full operator-tree model