-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Closed
Labels
Description
What happens?
DuckDB only use a single thread if the first Parquet file is empty.
To Reproduce
Generate multiple test Parquet files.
create table tbl as from generate_series(100000000);
copy tbl to 'test-parquet/' (format parquet, per_thread_output true);
Reading these files is very fast with multiple threads.
explain analyze select * from 'test-parquet/data*.parquet';
┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││ Total Time: 2.76s ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
┌───────────────────────────┐
│ EXPLAIN_ANALYZE │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ 0 │
│ (0.15s) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PARQUET_SCAN │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ generate_series │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ EC: 109322272 │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ 100000001 │
│ (57.97s) │
└───────────────────────────┘
Generate an empty Parquet file and add it to file list. DuckDB will only use a single thread during reading.
explain analyze select * from read_parquet(['test-parquet/empty.parquet', 'test-parquet/data*.parquet']);
┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││ Total Time: 52.08s ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
┌───────────────────────────┐
│ EXPLAIN_ANALYZE │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ 0 │
│ (0.12s) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ READ_PARQUET │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ generate_series │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ EC: 0 │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ 100000001 │
│ (34.59s) │
└───────────────────────────┘
OS:
ubuntu 2204
DuckDB Version:
v0.9.3-dev1411 7d5150c
DuckDB Client:
cli
Full Name:
Yiyuan Liu
Affiliation:
High-Flyer AI
Have you tried this on the latest main
branch?
I have tested with a main build
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
- Yes, I have