Cost vs Actual Time in EXPLAIN ANALYZE: When the Numbers Don’t Match #

EXPLAIN ANALYZE exposes a friction point that surprises many engineers: the planner’s estimated cost and the executor’s measured wall-clock time often tell completely different stories. This page explains exactly what each number represents, where the gap comes from, and how to close it systematically.

For the broader context of locating the worst node in a plan, see Identifying Plan Bottlenecks.

What the Two Numbers Actually Measure #

The cost value in every plan node is a unitless, optimizer-calculated estimate. It is built from table statistics, index selectivity, and configuration parameters such as seq_page_cost and cpu_tuple_cost. Cost is relative — it exists only to let the planner compare candidate plans against each other. It is not a time prediction.

The actual time pair that EXPLAIN ANALYZE adds is real wall-clock measurement in milliseconds, split into:

These two numbers are measured per iteration of the node. If the node runs inside a loop, you must multiply by loops to get its true contribution to total query time. A node reporting actual time=0.05..0.08 loops=50000 costs 2,500–4,000 ms of wall-clock time — it looks trivial per row, but dominates the query.

The Specific Planner Condition That Causes Divergence #

The root cause is almost always a cardinality estimation error. When pg_statistic data is stale or missing, the planner miscalculates how many rows a scan or join will return. That wrong row count flows upward through the plan tree, corrupting the cost of every operator built on top of it.

Three concrete triggers:

  1. Stale statistics: last_analyze is far in the past, so pg_statistic no longer reflects the current data distribution. The planner’s selectivity estimates for predicates on those columns are wrong.

  2. Non-sargable predicates: Functions in WHERE clauses (WHERE date_trunc('month', created_at) = ...), implicit type casts, or expressions around a column prevent the planner from using the column’s histogram. It falls back to a fixed default selectivity (often 0.5% or 5%) that can miss the real count by orders of magnitude. Understanding how cost estimation models use per-column statistics explains why expression predicates break the model.

  3. Runtime conditions invisible to the planner: Lock waits, I/O saturation, and buffer cache state all affect wall-clock time but are absent from the cost model. A plan can be correctly estimated and still show high actual time under load.

Annotated EXPLAIN Evidence #

The two fields to focus on when cost and actual time diverge:

EXPLAIN (ANALYZE, BUFFERS)
SELECT o.order_id, SUM(i.quantity * i.unit_price)
FROM orders o
JOIN order_items i ON o.order_id = i.order_id
WHERE o.created_at >= '2024-01-01'
GROUP BY o.order_id;
Hash Join  (cost=320.00..8900.00 rows=120 width=40)
           (actual time=1.2..3840.5 rows=94000 loops=1)
  -- ^^^ rows estimate: 120; actual rows: 94000 — off by 783x.
  --     This single mismatch forces the Hash Join to build a
  --     table 783x larger than planned, spilling to disk.
  Buffers: shared hit=1840 read=22300
  ->  Seq Scan on orders  (cost=0.00..210.00 rows=120 width=16)
                          (actual time=0.04..180.3 rows=94000 loops=1)
        -- ^^^ The stale estimate originates here.
        Filter: (created_at >= '2024-01-01')
        Rows Removed by Filter: 6000
  ->  Hash  (cost=90.00..90.00 rows=8200 width=28)
            (actual time=0.8..0.8 rows=8200 loops=1)
        Batches: 4  Memory Usage: 4096kB
        -- ^^^ Batches: 4 confirms disk spill. The hash table
        --     had to be rebuilt across 4 passes because work_mem
        --     was not enough for the actual 94000-row build side.

The two diagnostic fields are rows vs actual rows (cardinality estimation accuracy) and Batches (memory spill indicator). Both are visible in text-format EXPLAIN ANALYZE; the JSON format (FORMAT JSON) additionally exposes Actual Rows, Plan Rows, and Peak Memory Usage as structured fields that tools can parse.

Visualising the Estimation Gap #

The diagram below shows how a row-count error at a base scan propagates upward through a Hash Join and into the final aggregation node, inflating each parent’s actual cost beyond what the planner modelled.

Row count estimation error propagating up the plan tree A three-node plan tree: Seq Scan underestimates rows (120 vs 94000), which propagates into Hash Join causing disk spill, then into GroupAggregate causing excess CPU work. Seq Scan on orders est rows: 120 actual rows: 94 000 stale statistics Hash order_items (8 200 rows) Batches: 4 disk spill Hash Join est cost: 8 900 | est rows: 120 actual time: 3 840 ms | rows: 94 000 inherited cardinality error GroupAggregate est rows: 120 groups actual rows: 94 000 groups excess CPU + memory

Step-by-Step Resolution Workflow #

Step 1. Capture full node metrics, including buffer I/O:

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.order_id, c.customer_name, SUM(i.quantity * i.unit_price)
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN order_items i ON o.order_id = i.order_id
WHERE o.created_at >= '2024-01-01'
GROUP BY o.order_id, c.customer_name;

Step 2. Find the node with the worst row estimate ratio. Look for rows (estimate) vs actual rows differing by 10x or more — that node is the origin of the cost divergence.

Step 3. Verify statistics freshness for the table where the estimate is wrong:

SELECT relname,
       last_analyze,
       last_autoanalyze,
       n_live_tup,
       n_dead_tup,
       n_mod_since_analyze
FROM pg_stat_user_tables
WHERE relname IN ('orders', 'order_items');
-- n_mod_since_analyze >> 0 on a large table means statistics are stale.

Step 4. Refresh statistics on the affected tables, then re-run EXPLAIN ANALYZE:

ANALYZE VERBOSE orders;
ANALYZE VERBOSE order_items;

-- Verify the column histogram was updated:
SELECT attname, n_distinct, most_common_vals, histogram_bounds
FROM pg_stats
WHERE tablename = 'orders' AND attname = 'created_at';

Step 5. If Batches > 1 or Sort Method: external merge appear, test whether increasing work_mem eliminates the spill:

SET LOCAL work_mem = '128MB';
EXPLAIN (ANALYZE, BUFFERS)
SELECT /* your query here */;
-- Compare Batches: before and after.
-- If Batches drops to 1, the spill was the bottleneck.

Step 6. Re-run EXPLAIN (ANALYZE, BUFFERS) after each change and confirm that rows and actual rows converge and that total actual time decreases.

Before/After Plan Comparison #

After ANALYZE refreshed the column statistics:

-- BEFORE (stale statistics)
Seq Scan on orders  (cost=0.00..210.00 rows=120 ...)
                    (actual time=0.04..180.3 rows=94000 ...)
  -- 783x row underestimate; downstream Hash Join spills to 4 batches.

-- AFTER (fresh statistics)
Seq Scan on orders  (cost=0.00..1240.00 rows=94000 ...)
                    (actual time=0.04..42.1 rows=94000 ...)
  -- Estimate matches reality; Hash Join fits in memory (Batches: 1).

The planner’s cost rose significantly once it knew the true cardinality, but the actual time dropped from 3,840 ms to 42 ms because the correct row count let it allocate enough work_mem and choose an in-memory hash build.

Common Pitfalls #

Treating cost as milliseconds. Cost is a dimensionless comparative heuristic. A cost of 8,900 does not mean 8,900 ms. Equating them leads to misguided tuning decisions.

Ignoring the loops multiplier. Nested loop inner scans show per-iteration times. Multiply actual time by loops to measure the node’s real contribution. This is the single most common misread in plan analysis.

Tuning work_mem without checking Batches first. Raising work_mem globally is expensive. Verify the spill indicator (Batches > 1 for hash nodes, Sort Method: external merge for sort nodes) before changing memory settings. If spill is not the issue, extra memory has no effect.

Fixing cost without re-checking actual time. After running ANALYZE, re-run EXPLAIN ANALYZE to confirm the actual time improved. Statistics refresh changes the plan choice, but the new plan still needs to be validated against real execution.

Frequently Asked Questions #

Why does EXPLAIN ANALYZE show a low cost but high actual time?

Low cost with high actual time indicates inaccurate cardinality estimates. The planner chose an inefficient join strategy or scan based on wrong row count predictions. Stale statistics, missing indexes, or non-sargable predicates force the executor to process far more rows than anticipated. Run ANALYZE on the affected tables to update pg_statistic.

Should I optimize based on cost or actual time?

Always use actual time and actual rows as your primary signal. Cost is a planning heuristic — it guides plan selection but does not reflect real hardware performance, I/O latency, or memory pressure. Use cost only to understand why the planner made a specific choice, then validate any change against actual time.

How do I account for the loops multiplier when reading actual time?

Multiply the per-iteration actual time total by loops. A node showing actual time=0.05..0.08 loops=50000 contributes 2,500–4,000 ms of total wall-clock work despite its tiny per-iteration figure. Missing this multiplier is one of the most common errors when interpreting parallel query execution plans or nested loop plans.