Skip to content

Large limits are much slower in v1.2.0 #16278

@manlon

Description

@manlon

What happens?

After upgrading to duckdb 1.2.0 certain parquet export operations are taking much longer than in 1.1.3, especially for large(?) tables. On an example table with ~150M rows and 20 cols We see performance like:

./duckdb-1.1.3 test_parquet_writes.db -c "set preserve_insertion_order=false; copy (from t1 limit 100_000_000) to '/dev/null' (format parquet)"

takes 9s on v1.1.3 whereas

./duckdb-nightly test_parquet_writes.db -c "set preserve_insertion_order=false; copy (from t1 limit 100_000_000) to '/dev/null' (format parquet)"

takes 26s on nightly, a 3x slowdown. On wider tables the difference can be much worse.

To Reproduce

here is a dropbox link to a database that demonstrates the issue https://www.dropbox.com/scl/fi/jd0y5cjdabwfgriuwhick/test_parquet_writes.db.gz?rlkey=tfhu5mxtsm9ng4yv0bhzcwtok&st=l8blewpt&dl=1

after downloading/unzipping run:

time duckdb test_parquet_writes.db -c "set preserve_insertion_order=false; copy (from t1 limit 100_000_000) to '/dev/null' (format parquet)"

on 1.2.0 vs 1.1.3

OS:

macos arm / apple silicon

DuckDB Version:

1.2.0

DuckDB Client:

cli

Hardware:

repros with 16core / 128 GB and 8 core / 8 GB

Full Name:

Matt Hanlon

Affiliation:

PwC

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a nightly build

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions