-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
What happens?
After upgrading to duckdb 1.2.0 certain parquet export operations are taking much longer than in 1.1.3, especially for large(?) tables. On an example table with ~150M rows and 20 cols We see performance like:
./duckdb-1.1.3 test_parquet_writes.db -c "set preserve_insertion_order=false; copy (from t1 limit 100_000_000) to '/dev/null' (format parquet)"
takes 9s on v1.1.3 whereas
./duckdb-nightly test_parquet_writes.db -c "set preserve_insertion_order=false; copy (from t1 limit 100_000_000) to '/dev/null' (format parquet)"
takes 26s on nightly, a 3x slowdown. On wider tables the difference can be much worse.
To Reproduce
here is a dropbox link to a database that demonstrates the issue https://www.dropbox.com/scl/fi/jd0y5cjdabwfgriuwhick/test_parquet_writes.db.gz?rlkey=tfhu5mxtsm9ng4yv0bhzcwtok&st=l8blewpt&dl=1
after downloading/unzipping run:
time duckdb test_parquet_writes.db -c "set preserve_insertion_order=false; copy (from t1 limit 100_000_000) to '/dev/null' (format parquet)"
on 1.2.0 vs 1.1.3
OS:
macos arm / apple silicon
DuckDB Version:
1.2.0
DuckDB Client:
cli
Hardware:
repros with 16core / 128 GB and 8 core / 8 GB
Full Name:
Matt Hanlon
Affiliation:
PwC
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a nightly build
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include all code required to reproduce the issue?
- Yes, I have
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
- Yes, I have