Skip to content

duckdb/pg_duckdb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

pg_duckdb logo

PostgreSQL extension for DuckDB

Docker Pulls GitHub Release License


pg_duckdb: Official PostgreSQL Extension for DuckDB

pg_duckdb integrates DuckDB's columnar-vectorized analytics engine into PostgreSQL, enabling high-performance analytics and data-intensive applications. Built in collaboration with Hydra and MotherDuck.

Key Features

  • Execute analytics queries without changes: run your existing SQL analytics queries as you normally would, and pg_duckdb will automatically use DuckDB's SQL engine to execute them when you set duckdb.force_execution=true.
  • Read/write data from data lakes: Read/write* Parquet, CSV, JSON, Iceberg & Delta Lake from S3, GCS, Azure & R2.
  • Integration with cloud analytics: Out-of-the-box support of MotherDuck as compute provider.

How pg_duckdb works

pg_duckdb automatically accelerates your existing analytical queries.

  • No special syntax needed: for queries on your regular PostgreSQL tables, you don't need to change your SQL. Just run your SELECT statements as you normally would, and pg_duckdb will use DuckDB's engine to execute them.
  • No data export required: You do not need to export your data to Parquet or any other format. pg_duckdb works directly with your existing PostgreSQL tables.

See it in action

Querying your existing PostgreSQL data

This is the most common and straightforward use case. If you have a standard PostgreSQL table, you can query it using standard SQL.

Example:

Let's say you have a PostgreSQL table named orders (to create it, see syntax guide). To run an analytical query, you just write standard SQL, configure duckdb.force_exectunio and pg_duckdb will handle the rest.

SET duckdb.force_execution = true;
SELECT
    order_date,
    COUNT(*) AS number_of_orders,
    SUM(amount) AS total_revenue
FROM
    orders
GROUP BY
    order_date
ORDER BY
    order_date;

Querying external data (your first data lake query)

pg_duckdb allows you to query external files (like Parquet or CSV) as if they were tables in your database. This is perfect for querying data lakes from pg_duckdb. To learn more on these functions, see read functions documentation.

-- Setup S3 access in seconds directly from SQL
SELECT duckdb.create_simple_secret(
    type := 'S3', key_id := 'your_key', secret := 'your_secret', region := 'us-east-1'
);

SELECT
    r['product_name'], -- 'r' is to iterate on the row object returned from read_parquet()
    AVG(r['rating']) AS average_rating
FROM
    read_parquet('s3://your-bucket/reviews.parquet') r
GROUP BY
    r['product_name']
ORDER BY
    average_rating DESC;

Combining PostgreSQL and DuckDB data

You can easily join your PostgreSQL tables with external data from your data lake.

-- Join a PostgreSQL table with a remote Parquet file
SELECT
    o.product_name,
    o.total_revenue,
    r.average_rating
FROM
    (
        -- First, aggregate our local orders data
        SELECT
            product_name,
            SUM(amount) AS total_revenue
        FROM
            orders
        GROUP BY
            product_name
    ) o
JOIN
    (
        -- Then, aggregate our remote reviews data
        SELECT
            r['product_name'] AS product_name,
            AVG(r['rating']) AS average_rating
        FROM
            read_parquet('s3://your-bucket/reviews.parquet') r
        GROUP BY
            r['product_name']
    ) r ON o.product_name = r.product_name
ORDER BY
    o.total_revenue DESC;

Modern DataLake Formats

Work with modern data formats like DuckLake, Iceberg and Delta Lake. To learn more, see extensions documentation.

-- Query Apache Iceberg tables with time travel
SELECT duckdb.install_extension('iceberg');
SELECT * FROM iceberg_scan('s3://warehouse/sales_iceberg', version := '2024-03-15-snapshot')

-- Process Delta Lake with schema evolution
SELECT duckdb.install_extension('delta');
SELECT * FROM delta_scan('s3://lakehouse/user_events')

MotherDuck integration (optional)

pg_duckdb integrates with MotherDuck, a cloud analytics platform. This allows you to run your queries on MotherDuck's powerful compute infrastructure, while still using your existing PostgreSQL tables.

To learn more, see MotherDuck documentation.

-- Connect to MotherDuck
CALL duckdb.enable_motherduck('<your_motherduck_token>');
-- Your existing MotherDuck tables appear automatically
SELECT region, COUNT(*) FROM my_cloud_analytics_table;

-- Create cloud tables that sync across teams
CREATE TABLE real_time_kpis USING duckdb AS
SELECT
    date_trunc('day', created_at) as date,
    COUNT(*) as daily_signups,
    SUM(revenue) as daily_revenue
FROM user_events
GROUP BY date;

Quick Start

Docker

Run PostgreSQL with pg_duckdb pre-installed in a docker container:

docker run -d -e POSTGRES_PASSWORD=duckdb pgduckdb/pgduckdb:17-1.0.0

With MotherDuck:

export MOTHERDUCK_TOKEN=<your_token>
docker run -d -e POSTGRES_PASSWORD=duckdb -e MOTHERDUCK_TOKEN pgduckdb/pgduckdb:17-1.0.0

Try with Hydra

You can also get started using Hydra:

pip install hydra-cli
hydra

Package Managers

pgxman (apt):

pgxman install pg_duckdb

Compile from source:

git clone https://github.com/duckdb/pg_duckdb
cd pg_duckdb
make install

See compilation guide for detailed instructions.

Configuration

See settings documentation for complete configuration options.

Documentation

Topic Description
Functions Complete function reference
Syntax Guide & Gotchas Quick reference for common SQL patterns and things to know
Types Supported data types and advanced types usage
MotherDuck Cloud integration guide
Secrets Credential management
Extensions DuckDB extension usage
Transactions Transaction behavior
Compilation Build from source

Note: Advanced DuckDB types (STRUCT, MAP, UNION) require DuckDB execution context. Use duckdb.query() for complex type operations and TEMP tables for DuckDB table creation in most cases. See Types documentation for details.

Performance

pg_duckdb excels at:

  • Analytical Workloads: Aggregations, window functions, complex JOINs
  • Data Lake Queries: Scanning large Parquet/CSV files
  • Mixed Workloads: Combining OLTP (PostgreSQL) with OLAP (DuckDB)
  • ETL Pipelines: Transform and load data at scale

Contributing

We welcome contributions! Please see:

Support

Requirements

  • PostgreSQL: 14, 15, 16, 17
  • Operating Systems: Ubuntu 22.04-24.04, macOS

License

Licensed under the MIT License.


Built with ❤️
in collaboration with Hydra and MotherDuck

About

DuckDB-powered Postgres for high performance apps & analytics.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 37