PostgreSQL · performance · query optimization

PostgreSQL Query Optimization: Indexes, EXPLAIN, and Beyond

Master the art of tuning queries for speed and efficiency

❝ PostgreSQL is a powerful relational database, but even the best database can suffer from slow queries without proper optimization. Understanding how the query planner works, using indexes effectively, and reading EXPLAIN output are essential skills for any developer or DBA. This guide takes you from the basics to advanced optimization techniques.❞

We'll cover index types (B‑tree, GiST, GIN, BRIN), how to analyze query plans with EXPLAIN, statistics tuning, partitioning, and common anti‑patterns. By the end, you'll be able to identify bottlenecks and make your PostgreSQL queries lightning fast.

1. How PostgreSQL Executes a Query

When you send a query, PostgreSQL goes through several stages:

The planner's decisions depend on table statistics (collected by ANALYZE) and available indexes. A poorly informed planner can make terrible choices, which is why regular VACUUM ANALYZE is crucial.

2. Index Types and When to Use Them

Indexes dramatically speed up data retrieval but add overhead on writes. Choosing the right index type is key.

📚 B‑Tree (default)

Best for equality and range queries (=, >, <, BETWEEN, LIKE 'prefix%'). Use for most columns with high cardinality.

CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_created ON orders(created_at);

🗂️ GiST & SP‑GiST

Generalized Search Tree – great for geometric data, full‑text search, and nearest‑neighbor queries.

CREATE INDEX idx_locations ON points USING gist(location);

📦 GIN

Generalized Inverted Index – perfect for arrays, JSONB, and full‑text search. Fast for containment (@>, ?).

CREATE INDEX idx_tags ON articles USING gin(tags);
CREATE INDEX idx_data ON docs USING gin(data jsonb_path_ops);

📊 BRIN

Block Range INdex – ideal for very large tables with natural ordering (e.g., time‑series). Lightweight and small.

CREATE INDEX idx_events_time ON events USING brin(created_at);
Partial Indexes: Index only a subset of rows (e.g., WHERE active = true). Reduces size and improves speed.
CREATE INDEX idx_active_users ON users(id) WHERE active = true;

3. EXPLAIN: The Query Optimizer's Blueprint

EXPLAIN shows the execution plan without running the query. EXPLAIN ANALYZE runs the query and adds actual timing and row counts.

EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT * FROM orders WHERE customer_id = 12345;

Key elements to look for:

-> Index Scan using idx_orders_customer on orders  (cost=0.29..12.34 rows=45 width=64)
   Index Cond: (customer_id = 12345)

If you see a sequential scan on a large table, consider adding an index.

4. Advanced Indexing Patterns

📌 Composite Indexes

Order matters! For queries with multiple conditions, create indexes on columns in the order of filtering and sorting.

CREATE INDEX idx_orders_customer_created ON orders(customer_id, created_at);
-- Supports WHERE customer_id = ? ORDER BY created_at

🔍 Covering Indexes (INCLUDE)

Include extra columns to avoid accessing the table (index‑only scans).

CREATE INDEX idx_users_email_cover ON users(email) INCLUDE (name, status);

🧮 Expression Indexes

Index on a function or expression, e.g., LOWER(email) for case‑insensitive searches.

CREATE INDEX idx_users_lower_email ON users(LOWER(email));

📦 JSONB Indexing

Use gin for containment and existence; btree for specific keys.

CREATE INDEX idx_data_gin ON table USING gin(data);
CREATE INDEX idx_data_key ON table ((data->>'key'));

5. Keeping Statistics Fresh: ANALYZE and Autovacuum

The planner relies on table statistics. Out‑of‑date statistics lead to bad plans. Run ANALYZE after significant data changes, or rely on autovacuum.

ANALYZE VERBOSE users;

You can adjust the statistics target for specific columns to get more accurate row estimates:

ALTER TABLE users ALTER COLUMN email SET STATISTICS 1000;

Check autovacuum status with pg_stat_user_tables. Ensure it's running regularly to prevent bloat.

6. Common Anti-Patterns and How to Fix Them

❌ SELECT *

Retrieves unnecessary columns, causing more I/O and possibly preventing index‑only scans. Always list needed columns.

❌ Using functions in WHERE clauses

WHERE LOWER(email) = 'user@example.com' prevents index usage unless you have a functional index. Use index‑friendly expressions.

❌ OR conditions on different columns

WHERE col1 = x OR col2 = y may not use indexes efficiently. Consider UNION or multiple queries.

❌ LIKE with leading wildcard

LIKE '%term%' cannot use B‑tree indexes. Use full‑text search or trigram indexes (pg_trgm).

CREATE EXTENSION pg_trgm;
CREATE INDEX idx_trgm ON table USING gin (column gin_trgm_ops);

7. Optimizing Joins

Join order and method heavily impact performance.

-- Example: materialized view
CREATE MATERIALIZED VIEW order_summary AS
SELECT c.name, SUM(o.total) as total_spent
FROM customers c
JOIN orders o ON c.id = o.customer_id
GROUP BY c.name;
REFRESH MATERIALIZED VIEW order_summary;

8. Table Partitioning (Declarative Partitioning)

For tables with millions of rows, partitioning can dramatically improve query performance and maintenance. PostgreSQL supports range, list, and hash partitioning.

CREATE TABLE events (
    id serial,
    created_at timestamptz NOT NULL,
    data jsonb
) PARTITION BY RANGE (created_at);

CREATE TABLE events_2024_01 PARTITION OF events
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

-- Queries that filter on created_at will only scan relevant partitions

Partition pruning happens automatically when the planner can eliminate partitions.

9. PostgreSQL Configuration Tuning

Sometimes the issue is server configuration. Key parameters to adjust (based on available RAM):

Use pg_settings and tools like pgtune to get baseline recommendations.

10. Identifying Slow Queries with pg_stat_statements

The pg_stat_statements extension tracks query performance statistics.

CREATE EXTENSION pg_stat_statements;

SELECT query, calls, total_time, mean_time, rows
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 10;

This reveals which queries are the slowest and how often they run. Reset with pg_stat_statements_reset().

Enable logging of slow queries with log_min_duration_statement = 1000 to log queries taking >1s.

11. Case Study: From 8s to 100ms

A large e‑commerce site had a product search query taking 8 seconds. After analysis:

Result: query time dropped to 120ms. Further optimization with partitioning by category brought it under 100ms. The site saw a 30% increase in conversions due to faster search.

12. When the Planner Gets It Wrong: Force Plan with pg_hint_plan

Sometimes the planner chooses a suboptimal join order or index. The extension pg_hint_plan lets you override the plan with hints.

-- Example: force a nested loop join
/*+ NestLoop(t1 t2) IndexScan(t2 idx_t2_col) */
SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE t1.col = 1;

Use sparingly; first try to fix statistics or indexes.

13. Managing Bloat and Vacuum

PostgreSQL's MVCC leaves dead rows. If autovacuum can't keep up, table bloat occurs, causing slower scans and increased I/O.

Monitor bloat with:

SELECT schemaname, tablename, n_dead_tup, n_live_tup,
       round(100 * n_dead_tup / (n_live_tup + n_dead_tup)::numeric, 2) AS dead_ratio
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY dead_ratio DESC;

If dead rows accumulate, run VACUUM FULL (locks table) or use pg_repack (online).

14. Optimization Checklist

Final Thoughts: Optimize Iteratively

Query optimization is an iterative process. Start by identifying slow queries, analyze their execution plans, and apply targeted fixes—whether adding an index, rewriting the query, or tuning configuration. PostgreSQL provides powerful tools to inspect and improve performance; the key is to understand the data distribution and the planner's decisions. With practice, you'll be able to diagnose issues quickly and keep your database responsive even as it grows.

Happy optimizing — may your queries run in milliseconds.