Stream PostgreSQL WALs with Zero Data Loss
pgrwl is a PostgreSQL write-ahead log (WAL) receiver written in Go. It’s a drop-in, container-friendly alternative
to pg_receivewal
, supporting streaming replication, encryption, compression, and remote storage (S3, SFTP).
Designed for disaster recovery and PITR (Point-in-Time Recovery), pgrwl
ensures zero data loss (RPO=0) and seamless
integration with Kubernetes environments.
- About
- Usage
- Quick Start
- Configuration Reference
- Installation
- Disaster Recovery Use Cases
- Architecture
- Contributing
- License
-
The project serves as a research platform to explore streaming WAL archiving with a target of RPO=0 during recovery.
-
It's primarily designed for use in containerized environments.
-
The utility replicates all key features of
pg_receivewal
, including automatic reconnection on connection loss, streaming into partial files, extensive error checking and more. -
Install as a single binary. Debug with your favorite editor and a local PostgreSQL container (local-dev-infra).
pgrwl
running in receive
mode
pgrwl
running in serve
mode
pgrwl
running in backup
mode
Receive
mode is the main loop of the WAL receiver.
#!/usr/bin/env bash
set -euo pipefail
# Clone repo
git clone https://github.com/hashmap-kz/pgrwl.git
cd pgrwl
# Prepare docker-compose files
cat <<EOF >pg_hba.conf
local all all trust
local replication all trust
host all all all trust
host replication all all trust
EOF
cat <<EOF >docker-compose.yml
services:
pg-primary:
image: postgres:17
container_name: pg-primary
restart: unless-stopped
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
ports:
- "15432:5432"
volumes:
- pg-primary-data:/var/lib/postgresql/17/main
- ./pg_hba.conf:/etc/postgresql/pg_hba.conf:ro
command: -c wal_level=replica
-c max_wal_senders=10
-c wal_keep_size=64MB
-c listen_addresses=*
-c log_replication_commands=on
-c hba_file=/etc/postgresql/pg_hba.conf
volumes:
pg-primary-data:
EOF
# Run containers, wait while pg is ready
docker compose up -d
until docker exec pg-primary pg_isready -U postgres >/dev/null 2>&1; do
echo "Waiting for PostgreSQL to be ready..."
sleep 1
done
# Run WAL receiver with local storage
cat <<EOF >config.yml
main:
listen_port: 7070
directory: wals
receiver:
slot: pgrwl_v5
log:
level: trace
format: text
add_source: true
EOF
export PGHOST=localhost
export PGPORT=15432
export PGUSER=postgres
export PGPASSWORD=postgres
export PGRWL_MODE=receive
go run main.go start -c config.yml
Serve
mode is used during restore to serve archived WAL files from storage.
cat <<EOF >config.yml
main:
listen_port: 7070
directory: wals
log:
level: trace
format: text
add_source: true
EOF
export PGRWL_MODE=serve
pgrwl start -c config.yml
Backup
mode performs a full base backup of your PostgreSQL cluster on a configured schedule.
cat <<EOF >config.yml
main:
listen_port: 7070
directory: wals
backup:
cron: "0 0 */3 * *"
retention:
enable: true
type: time
value: "96h"
log:
level: trace
format: text
add_source: true
EOF
export PGHOST=localhost
export PGPORT=5432
export PGUSER=postgres
export PGPASSWORD=postgres
export PGRWL_MODE=backup
pgrwl start -c config.yml
restore_command
example for postgresql.conf:
# where 'k8s-worker5:30266' represents the host and port
# of a 'pgrwl' instance running in 'serve' mode.
restore_command = 'pgrwl restore-command --serve-addr=k8s-worker5:30266 %f %p'
The configuration file is in JSON or YML format (*.json is preferred).
It supports environment variable placeholders like ${PGRWL_SECRET_ACCESS_KEY}
.
You may either use pgrwl start -c config.yml -m receive
or provide the corresponding environment variables and run
pgrwl start
.
main: # Required for both modes: receive/serve
listen_port: 7070 # HTTP server port (used for management)
directory: "/var/lib/pgwal" # Base directory for storing WAL files
receiver: # Required for 'receive' mode
slot: replication_slot # Replication slot to use
no_loop: false # If true, do not loop on connection loss
uploader: # Required for non-local storage type
sync_interval: 10s # Interval for the upload worker to check for new files
max_concurrency: 4 # Maximum number of files to upload concurrently
retention: # Optional
enable: true # Perform retention rules
sync_interval: 10s # Interval for the retention worker (shouldn't run frequently - 12h is typically sufficient)
keep_period: "1m" # Remove WAL files older than given period
backup: # Required for 'backup' mode
cron: "0 0 */3 * *" # Basebackup cron schedule
retention: # Optional
enable: true # Perform retention rules
type: time # One of: (time / count)
value: "48h" # Remove backups older than given period (if time), keep last N backups (if count)
keep_last: 1 # Always keep last N backups (suitable when 'retention.type = time')
walretention: # Optional (WAL archive cleanup settings)
enable: true # After basebackup is done, cleanup WAL-archive by oldest backup stop-LSN
receiver_addr: "pgrwl-receive:7070" # Address or WAL-receiver instance (required when manage_cleanup is set to true)
log: # Optional
level: info # One of: (trace / debug / info / warn / error)
format: text # One of: (text / json)
add_source: true # Include file:line in log messages (for local development)
metrics: # Optional
enable: true # Optional (used in receive mode: http://host:port/metrics)
devconfig: # Optional (various dev options)
pprof: # pprof settings
enable: true # Enable pprof handlers
storage: # Optional
name: s3 # One of: (s3 / sftp)
compression: # Optional
algo: gzip # One of: (gzip / zstd)
encryption: # Optional
algo: aes-256-gcm # One of: (aes-256-gcm)
pass: "${PGRWL_ENCRYPT_PASSWD}" # Encryption password (from env)
sftp: # Required section for 'sftp' storage
host: sftp.example.com # SFTP server hostname
port: 22 # SFTP server port
user: backupuser # SFTP username
pass: "${PGRWL_VM_PASSWORD}" # SFTP password (from env)
pkey_path: "/home/user/.ssh/id_rsa" # Path to SSH private key (optional)
pkey_pass: "${PGRWL_SSH_PKEY_PASS}" # Required if the private key is password-protected
base_dir: "/mnt/wal-archive" # Base directory with sufficient user permissions
s3: # Required section for 's3' storage
url: https://s3.example.com # S3-compatible endpoint URL
access_key_id: AKIAEXAMPLE # AWS access key ID
secret_access_key: "${PGRWL_AWS_SK}" # AWS secret access key (from env)
bucket: postgres-backups # Target S3 bucket name
region: us-east-1 # S3 region
use_path_style: true # Use path-style URLs for S3
disable_ssl: false # Disable SSL
Corresponding env-vars.
PGRWL_MAIN_LISTEN_PORT # HTTP server port (used for management)
PGRWL_MAIN_DIRECTORY # Base directory for storing WAL files
PGRWL_RECEIVER_SLOT # Replication slot to use
PGRWL_RECEIVER_NO_LOOP # If true, do not loop on connection loss
PGRWL_RECEIVER_UPLOADER_SYNC_INTERVAL # Interval for the upload worker to check for new files
PGRWL_RECEIVER_UPLOADER_MAX_CONCURRENCY # Maximum number of files to upload concurrently
PGRWL_RECEIVER_RETENTION_ENABLE # Perform retention rules
PGRWL_RECEIVER_RETENTION_SYNC_INTERVAL # Interval for the retention worker (shouldn't run frequently - 12h is typically sufficient)
PGRWL_RECEIVER_RETENTION_KEEP_PERIOD # Remove WAL files older than given period
PGRWL_BACKUP_CRON # Basebackup cron schedule
PGRWL_BACKUP_RETENTION_ENABLE # Perform retention rules
PGRWL_BACKUP_RETENTION_TYPE # One of: (time / count)
PGRWL_BACKUP_RETENTION_VALUE # Remove backups older than given period (if time), keep last N backups (if count)
PGRWL_BACKUP_RETENTION_KEEP_LAST # Always keep last N backups (suitable when 'retention.type = time')
PGRWL_BACKUP_WALRETENTION_ENABLE # After basebackup is done, cleanup WAL-archive by oldest backup stop-LSN
PGRWL_BACKUP_WALRETENTION_RECEIVER_ADDR # Address or WAL-receiver instance (required when manage_cleanup is set to true)
PGRWL_LOG_LEVEL # One of: (trace / debug / info / warn / error)
PGRWL_LOG_FORMAT # One of: (text / json)
PGRWL_LOG_ADD_SOURCE # Include file:line in log messages (for local development)
PGRWL_METRICS_ENABLE # Optional (used in receive mode: http://host:port/metrics)
PGRWL_DEVCONFIG_PPROF_ENABLE # Enable pprof handlers
PGRWL_STORAGE_NAME # One of: (s3 / sftp)
PGRWL_STORAGE_COMPRESSION_ALGO # One of: (gzip / zstd)
PGRWL_STORAGE_ENCRYPTION_ALGO # One of: (aes-256-gcm)
PGRWL_STORAGE_ENCRYPTION_PASS # Encryption password (from env)
PGRWL_STORAGE_SFTP_HOST # SFTP server hostname
PGRWL_STORAGE_SFTP_PORT # SFTP server port
PGRWL_STORAGE_SFTP_USER # SFTP username
PGRWL_STORAGE_SFTP_PASS # SFTP password (from env)
PGRWL_STORAGE_SFTP_PKEY_PATH # Path to SSH private key (optional)
PGRWL_STORAGE_SFTP_PKEY_PASS # Required if the private key is password-protected
PGRWL_STORAGE_SFTP_BASE_DIR # Base directory with sufficient user permissions
PGRWL_STORAGE_S3_URL # S3-compatible endpoint URL
PGRWL_STORAGE_S3_ACCESS_KEY_ID # AWS access key ID
PGRWL_STORAGE_S3_SECRET_ACCESS_KEY # AWS secret access key (from env)
PGRWL_STORAGE_S3_BUCKET # Target S3 bucket name
PGRWL_STORAGE_S3_REGION # S3 region
PGRWL_STORAGE_S3_USE_PATH_STYLE # Use path-style URLs for S3
PGRWL_STORAGE_S3_DISABLE_SSL # Disable SSL
Docker images are available at quay.io/hashmap_kz/pgrwl
docker pull quay.io/hashmap_kz/pgrwl:latest
- Download the latest binary for your platform from the Releases page.
- Place the binary in your system's
PATH
(e.g.,/usr/local/bin
).
(
set -euo pipefail
OS="$(uname | tr '[:upper:]' '[:lower:]')"
ARCH="$(uname -m | sed -e 's/x86_64/amd64/' -e 's/\(arm\)\(64\)\?.*/\1\2/' -e 's/aarch64$/arm64/')"
TAG="$(curl -s https://api.github.com/repos/hashmap-kz/pgrwl/releases/latest | jq -r .tag_name)"
curl -L "https://github.com/hashmap-kz/pgrwl/releases/download/${TAG}/pgrwl_${TAG}_${OS}_${ARCH}.tar.gz" |
tar -xzf - -C /usr/local/bin && \
chmod +x /usr/local/bin/pgrwl
)
sudo apt update -y && sudo apt install -y curl
curl -LO https://github.com/hashmap-kz/pgrwl/releases/latest/download/pgrwl_linux_amd64.deb
sudo dpkg -i pgrwl_linux_amd64.deb
apk update && apk add --no-cache bash curl
curl -LO https://github.com/hashmap-kz/pgrwl/releases/latest/download/pgrwl_linux_amd64.apk
apk add pgrwl_linux_amd64.apk --allow-untrusted
See pgrwl-helm-chart
helm repo add pgrwl https://hashmap-kz.github.io/pgrwl
helm repo update
To install the chart with the release name pgrwl
:
helm install pgrwl pgrwl/pgrwl
The full process may look like this (a typical, rough, and simplified example):
-
A typical production setup runs two
pgrwl
StatefulSets in the cluster: one inreceive
mode for continuous WAL streaming, and another inbackup
mode for scheduled base backups. -
In
receive
mode,pgrwl
continuously streams WAL files, applies optional compression and encryption, uploads them to remote storage (such as S3 or SFTP), and enforces retention policies - for example, keeping WAL files for four days. -
In
backup
mode, it performs a full base backup of your PostgreSQL cluster on a configured schedule - for instance, once every three days - using streaming basebackup, with optional compression and encryption. The resulting backup is also uploaded to the configured remote storage, and subject to retention policies for cleanup. The built-in cron scheduler enables fully automated backups without requiring external orchestration. -
During recovery, the same
receive
StatefulSet can be reconfigured to run inserve
mode, exposing previously archived WALs via HTTP to support Point-in-Time Recovery (PITR) throughrestore_command
. -
With this setup, you're able to restore your cluster - in the event of a crash - to any second within the past three days, using the most recent base backup and available WAL segments.
pgrwl
is designed to always stream WAL data to the local filesystem first. This design ensures durability and
correctness, especially in synchronous replication setups where PostgreSQL waits for the replica to confirm the commit.
- Incoming WAL data is written directly to
*.partial
files in a local directory. - These
*.partial
files are synced (fsync
) after each write to ensure that WAL segments are fully durable on disk. - Once a WAL segment is fully received, the
*.partial
suffix is removed, and the file is considered complete.
Compression and encryption are applied only after a WAL segment is completed:
- Completed files are passed to the uploader worker, which may compress and/or encrypt them before uploading to a remote backend (e.g., S3, SFTP).
- The uploader worker ignores partial files and operates only on finalized, closed segments.
This model avoids the complexity and risk of streaming incomplete WAL data directly to remote storage, which can lead to
inconsistencies or partial restores. By ensuring that all WAL files are locally durable and only completed files are
uploaded, pgrwl
guarantees restore safety and clean segment handoff for disaster recovery.
In short: PostgreSQL requires acknowledgments for commits in synchronous setups, and relying on external systems for critical paths (like WAL streaming) could introduce unacceptable delays or failures. This architecture mitigates that risk.
- After each WAL segment is written, an
fsync
is performed on the currently open WAL file to ensure durability. - An
fsync
is triggered when a WAL segment is completed and the*.partial
file is renamed to its final form. - An
fsync
is triggered when a keepalive message is received from the server with thereply_requested
option set. - Additionally,
fsync
is called whenever an error occurs during the receive-copy loop.
There’s a significant difference between using archive_command
and archiving WAL files via the streaming replication
protocol.
The archive_command
is triggered only after a WAL file is fully completed-typically when it reaches 16 MiB (the
default segment size). This means that in a crash scenario, you could lose up to 16 MiB of data.
You can mitigate this by setting a lower archive_timeout
(e.g., 1 minute), but even then, in a worst-case scenario,
you risk losing up to 1 minute of data.
Also, it’s important to note that PostgreSQL preallocates WAL files to the configured wal_segment_size
, so they are
created with full size regardless of how much data has been written. (Quote from documentation:
It is therefore unwise to set a very short archive_timeout
- it will bloat your archive storage.).
In contrast, streaming WAL archiving-when used with replication slots and the synchronous_standby_names
parameter-ensures that the system can be restored to the latest committed transaction.
This approach provides true zero data loss (RPO=0), making it ideal for high-durability requirements.
Contributions are welcomed and greatly appreciated. See CONTRIBUTING.md for details on submitting patches and the contribution workflow.
Check also the Developer Notes for additional information and guidelines.
- pg_receivewal Documentation
- pg_receivewal Source Code
- Streaming Replication Protocol
- Continuous Archiving and Point-in-Time Recovery
- Setting Up WAL Archiving
MIT License. See LICENSE for details.