Skip to content

UNPIVOT statement is extremely slow when working with many columns #13552

@maver1ck

Description

@maver1ck

What happens?

I'm trying to unpivot table with 30k+ columns. It is extremely slow on DuckDB.
In comparision similar operation in Pandas takes few seconds.

For 10k columns it takes 117 seconds on DuckDB vs 2,7 sec on Pandas

To Reproduce

For reporting purposes please find this example:

Data generation

create table r as select id, concat('column_', id) as column_name, concat('value_', id) as  column_value from (select unnest(generate_series(1,10000)) as id);
create table pivoted as pivot r on column_name using first(id) group by id;

Benchmark:

select count(*) from (unpivot pivoted on columns(* exclude id) into name 'column_name' value 'column_value');
┌──────────────┐
│ count_star() │
│    int64     │
├──────────────┤
│        10000 │
└──────────────┘
Run Time (s): real 117.393 user 215.706619 sys 18.343984

Pandas for comparison

import pandas as pd
import time

# Step 1: Create the DataFrame
data = {
    'id': range(10000),
    'column_name': ['column_' + str(i) for i in range(10000)],
    'column_value': range(10000)
}

df = pd.DataFrame(data)

# Step 2: Pivot the DataFrame
pivot_df = df.pivot(index='id', columns='column_name', values='column_value')

# Step 3: Melt the DataFrame back
start = time.time()
melted_df = pivot_df.reset_index().melt(id_vars='id', var_name='column_name', value_name='column_value')
print(time.time() - start)
# 2.771090269088745

OS:

MacOS

DuckDB Version:

1.0.0

DuckDB Client:

CLI

Full Name:

Maciej Bryński

Affiliation:

Cledar

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions