Skip to content

performance change in window functions in v1.2.0 #16250

@manlon

Description

@manlon

What happens?

After upgrading to duckdb 1.2.0 I have noticed certain queries with complicated window functions or multiple window clauses are much slower to execute.

To Reproduce

Here is an example:

select
    count(distinct uid) over preceding_year as uid_count_rolling,
    count(distinct uid) over cumulative as uid_count_cumulative,
    count(distinct (f1, f2)) over preceding_year as f_rolling,
    count(distinct (f1, f2)) over cumulative as f_cumulative,
from t1

window
    preceding_year as (
        partition by g1, g2, g3, g4
        order by d asc
        range between interval 11 months preceding and current row
    ),
    cumulative as (
        partition by g1, g2, g3, g4
        order by d asc
        rows between unbounded preceding and current row
    )

On my machine on a table with ~16M rows this took 1m44s on v1.2.0 but only 26s for v1.1.3, so ~4x, and I have seen more slowdown for more complicated queries. I can provide the table if needed (~200 MB).

OS:

macOS arm / apple silicon

DuckDB Version:

1.2.0

DuckDB Client:

CLI

Hardware:

repro with 8 core/ 8GB mem and with 16 core / 128GB

Full Name:

Matt Hanlon

Affiliation:

PwC

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a nightly build

Did you include all relevant data sets for reproducing the issue?

No - I cannot easily share my data sets due to their large size

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions