PostgreSQL extension for DuckDB
pg_duckdb integrates DuckDB's columnar-vectorized analytics engine into PostgreSQL, enabling high-performance analytics and data-intensive applications. Built in collaboration with Hydra and MotherDuck.
- Execute analytics queries without changes: run your existing SQL analytics queries as you normally would, and
pg_duckdb
will automatically use DuckDB's SQL engine to execute them when you setduckdb.force_execution=true
. - Read/write data from data lakes: Read/write* Parquet, CSV, JSON, Iceberg & Delta Lake from S3, GCS, Azure & R2.
- Integration with cloud analytics: Out-of-the-box support of MotherDuck as compute provider.
pg_duckdb
automatically accelerates your existing analytical queries.
- No special syntax needed: for queries on your regular PostgreSQL tables, you don't need to change your SQL. Just run your
SELECT
statements as you normally would, andpg_duckdb
will use DuckDB's engine to execute them. - No data export required: You do not need to export your data to Parquet or any other format.
pg_duckdb
works directly with your existing PostgreSQL tables.
This is the most common and straightforward use case. If you have a standard PostgreSQL table, you can query it using standard SQL.
Example:
Let's say you have a PostgreSQL table named orders
(to create it, see syntax guide). To run an analytical query, you just write standard SQL, configure duckdb.force_exectunio
and pg_duckdb
will handle the rest.
SET duckdb.force_execution = true;
SELECT
order_date,
COUNT(*) AS number_of_orders,
SUM(amount) AS total_revenue
FROM
orders
GROUP BY
order_date
ORDER BY
order_date;
pg_duckdb
allows you to query external files (like Parquet or CSV) as if they were tables in your database. This is perfect for querying data lakes from pg_duckdb
. To learn more on these functions, see read functions documentation.
-- Setup S3 access in seconds directly from SQL
SELECT duckdb.create_simple_secret(
type := 'S3', key_id := 'your_key', secret := 'your_secret', region := 'us-east-1'
);
SELECT
r['product_name'], -- 'r' is to iterate on the row object returned from read_parquet()
AVG(r['rating']) AS average_rating
FROM
read_parquet('s3://your-bucket/reviews.parquet') r
GROUP BY
r['product_name']
ORDER BY
average_rating DESC;
You can easily join your PostgreSQL tables with external data from your data lake.
-- Join a PostgreSQL table with a remote Parquet file
SELECT
o.product_name,
o.total_revenue,
r.average_rating
FROM
(
-- First, aggregate our local orders data
SELECT
product_name,
SUM(amount) AS total_revenue
FROM
orders
GROUP BY
product_name
) o
JOIN
(
-- Then, aggregate our remote reviews data
SELECT
r['product_name'] AS product_name,
AVG(r['rating']) AS average_rating
FROM
read_parquet('s3://your-bucket/reviews.parquet') r
GROUP BY
r['product_name']
) r ON o.product_name = r.product_name
ORDER BY
o.total_revenue DESC;
Work with modern data formats like DuckLake, Iceberg and Delta Lake. To learn more, see extensions documentation.
-- Query Apache Iceberg tables with time travel
SELECT duckdb.install_extension('iceberg');
SELECT * FROM iceberg_scan('s3://warehouse/sales_iceberg', version := '2024-03-15-snapshot')
-- Process Delta Lake with schema evolution
SELECT duckdb.install_extension('delta');
SELECT * FROM delta_scan('s3://lakehouse/user_events')
pg_duckdb
integrates with MotherDuck, a cloud analytics platform. This allows you to run your queries on MotherDuck's powerful compute infrastructure, while still using your existing PostgreSQL tables.
To learn more, see MotherDuck documentation.
-- Connect to MotherDuck
CALL duckdb.enable_motherduck('<your_motherduck_token>');
-- Your existing MotherDuck tables appear automatically
SELECT region, COUNT(*) FROM my_cloud_analytics_table;
-- Create cloud tables that sync across teams
CREATE TABLE real_time_kpis USING duckdb AS
SELECT
date_trunc('day', created_at) as date,
COUNT(*) as daily_signups,
SUM(revenue) as daily_revenue
FROM user_events
GROUP BY date;
Run PostgreSQL with pg_duckdb pre-installed in a docker container:
docker run -d -e POSTGRES_PASSWORD=duckdb pgduckdb/pgduckdb:17-1.0.0
With MotherDuck:
export MOTHERDUCK_TOKEN=<your_token>
docker run -d -e POSTGRES_PASSWORD=duckdb -e MOTHERDUCK_TOKEN pgduckdb/pgduckdb:17-1.0.0
You can also get started using Hydra:
pip install hydra-cli
hydra
pgxman (apt):
pgxman install pg_duckdb
Compile from source:
git clone https://github.com/duckdb/pg_duckdb
cd pg_duckdb
make install
See compilation guide for detailed instructions.
See settings documentation for complete configuration options.
Topic | Description |
---|---|
Functions | Complete function reference |
Syntax Guide & Gotchas | Quick reference for common SQL patterns and things to know |
Types | Supported data types and advanced types usage |
MotherDuck | Cloud integration guide |
Secrets | Credential management |
Extensions | DuckDB extension usage |
Transactions | Transaction behavior |
Compilation | Build from source |
Note: Advanced DuckDB types (STRUCT, MAP, UNION) require DuckDB execution context. Use duckdb.query()
for complex type operations and TEMP
tables for DuckDB table creation in most cases. See Types documentation for details.
pg_duckdb excels at:
- Analytical Workloads: Aggregations, window functions, complex JOINs
- Data Lake Queries: Scanning large Parquet/CSV files
- Mixed Workloads: Combining OLTP (PostgreSQL) with OLAP (DuckDB)
- ETL Pipelines: Transform and load data at scale
We welcome contributions! Please see:
- Contributing Guidelines
- Code of Conduct
- Project Milestones for upcoming features
- Discussions for feature requests
- Issues for bug reports
- Join the DuckDB Discord community then chat in the #pg_duckdb channel.
- Documentation: Complete documentation
- Community: DuckDB Discord #pg_duckdb channel
- Issues: GitHub Issues
- Commercial: Hydra and MotherDuck offer commercial support
- PostgreSQL: 14, 15, 16, 17
- Operating Systems: Ubuntu 22.04-24.04, macOS
Licensed under the MIT License.
Built with ❤️
in collaboration with Hydra and MotherDuck