Skip to content

Window function slower _without_ order by clause #18631

@soerenwolfers

Description

@soerenwolfers

What happens?

create or replace table df as (
    select random() as value from range(50_000)
);

Then

select 
    rowid,
    q: quantile_disc(value, 0.2) over (rows between unbounded preceding and current row) 
from
    df
qualify rowid % 10_000 = 0

takes 24s whereas the logically equialent

select 
    rowid,
    q: quantile_disc(value, 0.2) over (order by rowid rows between unbounded preceding and current row) 
from
    df
qualify rowid % 10_000 = 0

takes 0.16s. Note that the only difference is the order by rowid which is implicitly used anyway.

To Reproduce

.

OS:

ubuntu22

DuckDB Version:

1.4.0-dev861

DuckDB Client:

Python

Hardware:

intel64, 16gb, 4 cores

Full Name:

Soeren Wolfers

Affiliation:

G-Research

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a nightly build

Did you include all relevant data sets for reproducing the issue?

Not applicable - the reproduction does not require a data set

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions