-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
What happens?
Hello! I am testing the performance of read_parquet function , The number of data records is 143997065,Size of parquet files is about 29GB, every file has 144 columns. But in my three cases, I got different results,the difference is significant。
To Reproduce
Case 1:
29GB is a big parquet file,here is the profiling:
Case 2:
29GB parquet files are divided into 12 files,here is the profiling:
Case 3:
29GB files are divided into 128MB small files,about 268 files。
The results are different, I compared the results of the above three cases, Case 1 is similar to case 3, case 2 is slower than others,
I noticed that estimated cardinality (EC) is different, EC resulted in different query plans. could tell me How estimated cardinality (EC) is calculated when explain analyze a query?
OS:
windows10
DuckDB Version:
0.9.2
DuckDB Client:
duckdb_cli-windows-amd64.zip
Full Name:
Tom
Affiliation:
Strive
Have you tried this on the latest nightly build?
I have tested with a nightly build
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
- Yes, I have