PostgreSQL Performance Tuning: Complete Optimization Guide
Optimize PostgreSQL database performance with indexing strategies, query optimization, configuration tuning, and monitoring best practices for production workloads.
PostgreSQL Performance Tuning: Complete Optimization Guide
Maximize PostgreSQL performance through proper indexing, query optimization, configuration tuning, and monitoring strategies.
Performance Fundamentals
Key Performance Factors:
- Proper indexing strategy
- Query optimization
- Configuration tuning
- Hardware resources (CPU, RAM, I/O)
- Connection pooling
- Vacuuming and maintenance
Monitoring Goals:
- Query response time under 100ms for simple queries
- Connection pool efficiency above 80%
- Cache hit ratio above 95%
- Index usage for filtered queries
- Minimal table scans on large tables
Indexing Strategies
B-Tree Indexes (Default)
Best for equality and range queries:
-- Single column index
CREATE INDEX idx_users_email ON users(email);
-- Multi-column index (order matters!)
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);
-- Partial index (conditional)
CREATE INDEX idx_active_users ON users(email)
WHERE status = 'active';
-- Index with included columns (covering index)
CREATE INDEX idx_users_email_covering ON users(email)
INCLUDE (name, created_at);
Usage:
-- Uses idx_users_email
SELECT * FROM users WHERE email = 'john@example.com';
-- Uses idx_orders_customer_date
SELECT * FROM orders
WHERE customer_id = 123 AND order_date >= '2024-01-01';
GIN Indexes (Generalized Inverted Index)
Best for array, JSONB, full-text search:
-- JSONB index
CREATE INDEX idx_products_metadata ON products
USING GIN (metadata jsonb_path_ops);
-- Array index
CREATE INDEX idx_tags ON articles USING GIN (tags);
-- Full-text search
CREATE INDEX idx_content_search ON articles
USING GIN (to_tsvector('english', content));
Usage:
-- JSONB query
SELECT * FROM products
WHERE metadata @> '{"category": "electronics"}';
-- Array containment
SELECT * FROM articles WHERE tags @> ARRAY['postgresql', 'performance'];
-- Full-text search
SELECT * FROM articles
WHERE to_tsvector('english', content) @@ to_tsquery('postgresql & performance');
BRIN Indexes (Block Range Index)
Best for large, naturally ordered tables:
CREATE INDEX idx_events_timestamp ON events
USING BRIN (created_at);
Advantages:
- Tiny index size (1000x smaller than B-tree)
- Great for time-series data
- Minimal maintenance overhead
Trade-offs:
- Less precise than B-tree
- Only effective if data physically ordered
Hash Indexes
Best for exact equality only:
CREATE INDEX idx_users_id_hash ON users USING HASH (id);
Note: B-tree usually better choice due to versatility
Query Optimization
Using EXPLAIN ANALYZE
Identify slow queries:
EXPLAIN ANALYZE
SELECT u.name, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON u.id = o.customer_id
WHERE u.created_at >= '2024-01-01'
GROUP BY u.id, u.name
ORDER BY order_count DESC
LIMIT 100;
Key Metrics:
- Seq Scan: Table scan (bad for large tables)
- Index Scan: Using index (good)
- Nested Loop: Join method (expensive for large datasets)
- Hash Join: Efficient join for large tables
- Execution Time: Actual query runtime
Avoid N+1 Queries
BAD: Multiple queries:
users = db.query("SELECT * FROM users")
for user in users:
orders = db.query(f"SELECT * FROM orders WHERE customer_id = {user.id}")
GOOD: Single join query:
results = db.query("""
SELECT u.*, o.id AS order_id, o.total
FROM users u
LEFT JOIN orders o ON u.id = o.customer_id
""")
Optimize Subqueries
BAD: Subquery in SELECT:
SELECT
u.name,
(SELECT COUNT(*) FROM orders WHERE customer_id = u.id) AS order_count
FROM users u;
GOOD: Use JOIN with GROUP BY:
SELECT u.name, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON u.id = o.customer_id
GROUP BY u.id, u.name;
Use EXISTS Instead of IN
BAD: IN with subquery:
SELECT * FROM users
WHERE id IN (SELECT customer_id FROM orders WHERE total > 1000);
GOOD: EXISTS (stops at first match):
SELECT * FROM users u
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.customer_id = u.id AND o.total > 1000
);
Limit Result Sets
-- Always use LIMIT for pagination
SELECT * FROM large_table
ORDER BY created_at DESC
LIMIT 50 OFFSET 100;
-- Better: Use cursor-based pagination
SELECT * FROM large_table
WHERE id > last_seen_id
ORDER BY id
LIMIT 50;
Configuration Tuning
Memory Settings
postgresql.conf:
# Shared buffers (25% of RAM)
shared_buffers = 8GB
# Effective cache size (75% of RAM)
effective_cache_size = 24GB
# Work memory per operation
work_mem = 64MB
# Maintenance work memory
maintenance_work_mem = 2GB
# WAL buffers
wal_buffers = 16MB
Guidelines:
shared_buffers: 25% of total RAM (max 40%)effective_cache_size: 50-75% of total RAMwork_mem: Depends on concurrent connections- Formula: (RAM - shared_buffers) / (max_connections * 3)
maintenance_work_mem: Higher for faster VACUUM, INDEX creation
Connection Settings
# Maximum concurrent connections
max_connections = 200
# Use connection pooling (PgBouncer) for higher concurrency
Connection Pooling with PgBouncer:
[databases]
mydb = host=localhost port=5432 dbname=production
[pgbouncer]
listen_addr = *
listen_port = 6432
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
Write Performance
# WAL settings for performance
wal_level = replica
synchronous_commit = off # Accept small data loss for speed
wal_compression = on
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9
Trade-offs:
synchronous_commit = off: Faster writes, risk of data loss on crash- Higher
checkpoint_timeout: Fewer I/O spikes, longer recovery time
Autovacuum Tuning
# Autovacuum settings
autovacuum = on
autovacuum_max_workers = 4
autovacuum_naptime = 30s
autovacuum_vacuum_scale_factor = 0.1
autovacuum_analyze_scale_factor = 0.05
Purpose:
- Remove dead tuples
- Update statistics for query planner
- Prevent transaction ID wraparound
Monitoring and Diagnosis
Slow Query Log
# Log queries slower than 100ms
log_min_duration_statement = 100
log_line_prefix = '%t [%p]: user=%u,db=%d,app=%a,client=%h '
Active Queries
SELECT
pid,
now() - query_start AS duration,
state,
query
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY duration DESC;
Blocking Queries
SELECT
blocked_locks.pid AS blocked_pid,
blocked_activity.query AS blocked_query,
blocking_locks.pid AS blocking_pid,
blocking_activity.query AS blocking_query
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks
ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;
Cache Hit Ratio
SELECT
sum(heap_blks_read) AS heap_read,
sum(heap_blks_hit) AS heap_hit,
sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) AS cache_hit_ratio
FROM pg_statio_user_tables;
Target: Above 0.95 (95% hit ratio)
Index Usage
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;
Action: Drop unused indexes to save write performance
Table Bloat
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
n_dead_tup,
n_live_tup,
ROUND(n_dead_tup * 100.0 / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_ratio
FROM pg_stat_user_tables
WHERE n_live_tup > 0
ORDER BY n_dead_tup DESC;
Solution: VACUUM FULL or pg_repack for severe bloat
Advanced Optimization
Partitioning Large Tables
Range Partitioning:
CREATE TABLE events (
id SERIAL,
event_type VARCHAR(50),
event_data JSONB,
created_at TIMESTAMP NOT NULL
) PARTITION BY RANGE (created_at);
CREATE TABLE events_2024_01 PARTITION OF events
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE events_2024_02 PARTITION OF events
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
-- Create indexes on partitions
CREATE INDEX ON events_2024_01(event_type);
CREATE INDEX ON events_2024_02(event_type);
Benefits:
- Faster queries on time-ranges
- Easier data archival
- Improved maintenance performance
Materialized Views
Pre-computed aggregates:
CREATE MATERIALIZED VIEW daily_sales_summary AS
SELECT
DATE(created_at) AS sale_date,
product_category,
COUNT(*) AS transaction_count,
SUM(amount) AS total_sales
FROM sales
GROUP BY DATE(created_at), product_category;
CREATE UNIQUE INDEX ON daily_sales_summary (sale_date, product_category);
-- Refresh periodically
REFRESH MATERIALIZED VIEW CONCURRENTLY daily_sales_summary;
Benefits:
- Instant query response for complex aggregations
- Reduce database load
- CONCURRENTLY option allows reads during refresh
Parallel Query Execution
# Enable parallel execution
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
parallel_tuple_cost = 0.1
Force parallel execution:
SET max_parallel_workers_per_gather = 4;
SELECT COUNT(*) FROM large_table
WHERE category = 'electronics';
Best For:
- Large table scans
- Aggregations on millions of rows
- Data warehouse queries
Backup and Recovery
Physical Backups (pg_basebackup)
pg_basebackup -D /backup/pgdata -Ft -z -P
Advantages:
- Full cluster backup
- Fastest restore
- Point-in-time recovery with WAL
Logical Backups (pg_dump)
# Single database
pg_dump -Fc mydb > mydb.dump
# All databases
pg_dumpall > all_databases.sql
# Restore
pg_restore -d mydb mydb.dump
Advantages:
- Selective restore
- Cross-version compatible
- Text format for inspection
Point-in-Time Recovery (PITR)
# Enable WAL archiving
wal_level = replica
archive_mode = on
archive_command = 'cp %p /archive/%f'
Restore to specific time:
# In recovery.conf
restore_command = 'cp /archive/%f %p'
recovery_target_time = '2024-01-15 14:30:00'
High Availability
Streaming Replication
Primary server:
wal_level = replica
max_wal_senders = 3
wal_keep_size = 1GB
Standby server:
primary_conninfo = 'host=primary.example.com port=5432'
restore_command = 'cp /archive/%f %p'
Benefits:
- Real-time replication
- Read replicas for scaling
- Automatic failover with tools (Patroni, repmgr)
Connection Pooler (PgBouncer)
Setup:
[databases]
production = host=localhost port=5432
[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 20
Benefits:
- Reduced connection overhead
- Higher concurrency
- Connection reuse
Best Practices Summary
- Index wisely: Cover filtered columns, avoid over-indexing
- Use EXPLAIN ANALYZE: Understand query execution plans
- Tune memory settings: shared_buffers, work_mem based on workload
- Connection pooling: Use PgBouncer for high concurrency
- Regular VACUUM: Keep autovacuum properly configured
- Monitor cache hit ratio: Target above 95%
- Partition large tables: Improve query and maintenance performance
- Use materialized views: For complex, repeated aggregations
- Enable query logging: Log slow queries for optimization
- Regular backups: Automated daily backups with PITR
Troubleshooting Common Issues
Slow queries: Run EXPLAIN ANALYZE, check for seq scans on large tables
High CPU usage: Identify expensive queries, optimize or add indexes
Memory issues: Reduce work_mem or max_connections, add RAM
Connection pool exhausted: Increase pool size or optimize queries
Bloat: Run VACUUM or pg_repack on bloated tables
Lock contention: Identify blocking queries, optimize transaction duration
Bottom Line
PostgreSQL performance requires continuous optimization through proper indexing, query tuning, and configuration. Start with EXPLAIN ANALYZE to identify bottlenecks. Use connection pooling for concurrency. Monitor cache hit ratio and index usage. Partition large tables and use materialized views for analytics. Regular maintenance via autovacuum is critical. With proper tuning, PostgreSQL handles millions of transactions per second on modern hardware.
Ready to Transform Your Business?
Let's discuss how our AI and technology solutions can drive revenue growth for your organization.