Skip to content

How estimated cardinality (EC) is calculated when explain analyze a query? #10523

@qzbqzb

Description

@qzbqzb

What happens?

Hello! I am testing the performance of read_parquet function , The number of data records is 143997065,Size of parquet files is about 29GB, every file has 144 columns. But in my three cases, I got different results,the difference is significant。

To Reproduce

Case 1:
29GB is a big parquet file,here is the profiling:
image
image

Case 2:
29GB parquet files are divided into 12 files,here is the profiling:
image
image

Case 3:
29GB files are divided into 128MB small files,about 268 files。
image
image

The results are different, I compared the results of the above three cases, Case 1 is similar to case 3, case 2 is slower than others,
I noticed that estimated cardinality (EC) is different, EC resulted in different query plans. could tell me How estimated cardinality (EC) is calculated when explain analyze a query?

OS:

windows10

DuckDB Version:

0.9.2

DuckDB Client:

duckdb_cli-windows-amd64.zip

Full Name:

Tom

Affiliation:

Strive

Have you tried this on the latest nightly build?

I have tested with a nightly build

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions