What is WAL?
Write-Ahead Logging (WAL) is PostgreSQLs mechanism to ensure data durability and crash recovery.
The core principle of WAL is simple
Changes must be recorded in the WAL before corresponding data pages are written to the disk
When a transaction performs an INSERT, UPDATE, DELETE, or TRUNCATE operation, PostgreSQL generates WAL records describing those changes. These records are appended to the WAL stream and stored in WAL segment files located in the pg_wal directory.
When transaction commits, PostgreSQL Writes a commit record to the WAL and flushes the WAL records to durable storage. Once the WAL has been safely persisted, PostgreSQL acknowledges the transaction as committed and returns success to the client.
At this point, the modified table pages and index pages still reside only in memory (shared buffers). They are written to the actual data files later by the PostgreSQL background process. Note that the changes are not persisted to data files but the changes are persisted in WAL files as transaction logs. At this point if database crashes, the data changes are recovered from the WAL files.
Pages that have been modified in shared buffers but not yet written to disk are called dirty pages.
PostgreSQL periodically performs checkpoints. During a checkpoint dirty pages are flushed from shared buffers to the actual data or index files on disk. Checkpointing reduces the amount of WAL that must be replayed during crash recovery.
This design allows PostgreSQL to provide durability without forcing every modified data page to be flushed during each transaction commit.
Note - Since database transactions can happen in huge numbers, the WAL is stored in multiple files called WAL segments. By default each segment is 16 MB in size. Internally, PostgreSQL organizes storage into 8 KB pages.
Full Page Writes
While WAL protects committed transactions, another problem can occur during a crash: a data page write can be interrupted halfway through, leaving behind partially writen pages. This is called torn page. To protect againts this PostgreSQL uses Full Page Writes (FPW). After every checkpoint, the first page modication causes PostgreSQL to write the entire 8KB page image into the WAL. This ensures partially written pages are recovered safely and consistently.
Why do we need WAL ?
A natural question is:
Why not simply write every data change directly to the table data files on the disk and what is the need of having a WAL?
The answer is performance.
A transaction may modify multiple table pages and index pages scattered across disk. Flushing all of these pages during every commit would significantly increase the transaction latency.
Instead, PostgreSQL write compact WAL records sequentially to an append only log files. Sequential writes are much faster than repeatedly updating data pages across many location on the disk.
This allows PostgreSQL to acknowledge the commits quickly while deferring expensive data page writes until later.
WAL also provides crash recovery. If PostgreSQL crashes after WAL file has been persisted to the disk but before the corresponding data pages have been written to the disk, recovery replays the WAL records and bring the data files to a consistent state.
Every WAL record is associated with a Log Sequence Number (LSN), which identifies a specific position in the WAL stream. LSNs are used for crash recovery, replication, and Change Data Capture (CDC) systems.
WAL Storage and Segment files
WAL records are stored in PostgreSQL data directory under :
$PG_DATA/pg_wal
Example
root@5c7b10ae9ab3:/var/lib/postgresql/18/docker/pg_wal# ls
000000010000000000000052 00000001000000000000005E 00000001000000000000006A 000000010000000000000076 000000010000000000000082 00000001000000000000008E
000000010000000000000057 000000010000000000000063 00000001000000000000006F 00000001000000000000007B 000000010000000000000087 archive_status
000000010000000000000058 000000010000000000000064 000000010000000000000070 00000001000000000000007C 000000010000000000000088 summaries
These files are known as WAL segments. At first glance, the file names may look random, but they actually encode the WAL timeline ID and segment number.
Let’s inspect the wal location:
SELECT pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
0/66D53950
Now let’s ask PostgreSQL which WAL segment contains this position
SELECT pg_walfile_name(pg_current_wal_lsn());
pg_walfile_name
--------------------------
000000010000000000000066
(1 row)
PostgreSQL uses LSN to determine:
- Which WAL segment file contains the record.
- The exact byte position of the record within that segment.
Inspecting WAL records with pg_waldump
WAL segment files are binary and cannot be read directly. PostgreSQL provides the pg_waldump utility to inspect the records stored inside the WAL segment
The following command displays WAL records from a specific segment file:
pg_waldump 000000010000000000000066
Below is an example of WAL record describing an insertion in a table:
rmgr: Heap
tx: 779
lsn: 0/66000058
prev: 0/65FFFFE8
desc: INSERT
off: 55
flags: 0x08
blkref #0:
rel: 1663/16384/16389
blk: 56840
This record tells us that transaction 779 inserted a row in the Heap Table. The record was written at WAL location 0/66000058 and references a page within the relation 16389
If the table contains indexes, PostgreSQL generates additional WAL records describing the corresponding index modification.
rmgr: Btree
tx: 779
lsn: 0/660000C0
prev: 0/66000058
desc: INSERT_LEAF
off: 205
blkref #0:
rel: 1663/16384/16398
blk: 16686
This record describes an insertion in btree index. Together, the heap and the index records ensure that PostgreSQL can reconstuct both the table and its indexes during crash recovery.
Closing Thoughts
Wrie Ahead Logging is the most fundamental mechanism in PostgreSQL that helps in high performance and data durability by decoupling the transaction commits from expensive data file writes on disk.
The key idea is simple: write changes to WAL files first, persist the WAL files, acknowledge the transaction, and write actual data pages later. If a crash occurs before those pages reach disk, PostgreSQL can recover the database by replaying the WAL records.
Understanding WAL stream also helps understanding the foundation for mechanisms like logical and physical replication, Point In Time Recovery, & Change Data Capture systems. All of these features rely on the WAL stream as the authoritative records of database changes.