Skip to content

window "lead" incorrectly yields null data in some groups #14398

@hottwaj

Description

@hottwaj

What happens?

A 'LEAD(colname, 2)' operation over a simple partition produces incorrect results for one partition

Example using DuckDB live here

Tested on ubuntu and WASM
The output of the code below produces incorrect output for group 6 on date 2024-01-01 (row 4). DuckDb produces:

┌────────────┬───────┬───────┬────────────┬────────────┬────────────┐
│ date       ┆ group ┆ count ┆ status     ┆ end_date   ┆ end_status │
╞════════════╪═══════╪═══════╪════════════╪════════════╪════════════╡
│ 2024-01-01 ┆     3 ┆  3000 ┆ ordered    ┆ 2024-03-01 ┆ late       │
│ 2024-02-01 ┆     3 ┆  3000 ┆ ordered    ┆            ┆            │
│ 2024-03-01 ┆     3 ┆  3000 ┆ late       ┆            ┆            │
│ 2024-01-01 ┆     6 ┆  1000 ┆ dispatched ┆            ┆            │
│ 2024-02-01 ┆     6 ┆  1000 ┆ dispatched ┆            ┆            │
│ 2024-03-01 ┆     6 ┆  1000 ┆ dispatched ┆            ┆            │
│ 2024-01-01 ┆     4 ┆  4000 ┆ ordered    ┆ 2024-03-01 ┆ ordered    │
│ 2024-02-01 ┆     4 ┆  4000 ┆ ordered    ┆            ┆            │
│ 2024-03-01 ┆     4 ┆  4000 ┆ ordered    ┆            ┆            │
│ 2024-01-01 ┆     1 ┆  1000 ┆ ordered    ┆ 2024-03-01 ┆ dispatched │
│ 2024-02-01 ┆     1 ┆  1000 ┆ dispatched ┆            ┆            │
│ 2024-03-01 ┆     1 ┆  1000 ┆ dispatched ┆            ┆            │
│ 2024-01-01 ┆     7 ┆  1000 ┆ late       ┆ 2024-03-01 ┆ dispatched │
│ 2024-02-01 ┆     7 ┆  1000 ┆ dispatched ┆            ┆            │
│ 2024-03-01 ┆     7 ┆  1000 ┆ dispatched ┆            ┆            │
│ 2024-01-01 ┆     2 ┆  2000 ┆ ordered    ┆ 2024-03-01 ┆ ordered    │
│ 2024-02-01 ┆     2 ┆  2000 ┆ ordered    ┆            ┆            │
│ 2024-03-01 ┆     2 ┆  2000 ┆ ordered    ┆            ┆            │
│ 2024-01-01 ┆     5 ┆  5000 ┆ ordered    ┆ 2024-03-01 ┆ ordered    │
│ 2024-02-01 ┆     5 ┆  5000 ┆ late       ┆            ┆            │
│ 2024-03-01 ┆     5 ┆  5000 ┆ ordered    ┆            ┆            │
└────────────┴───────┴───────┴────────────┴────────────┴────────────┘    

Correct output for that row is:

2024-01-01 ┆  6 ┆    1000 ┆ dispatched   ┆ 2024-03-01 ┆ dispatched  

To Reproduce

following code produces the error:

CREATE TABLE testdata (date DATE, "group" INT, count INT, status STRING);
INSERT INTO testdata VALUES
('2024-01-01', 1, 1000, 'ordered'),
('2024-02-01', 1, 1000, 'dispatched'),
('2024-03-01', 1, 1000, 'dispatched'),
('2024-01-01', 2, 2000, 'ordered'),
('2024-02-01', 2, 2000, 'ordered'),
('2024-03-01', 2, 2000, 'ordered'),
('2024-01-01', 3, 3000, 'ordered'),
('2024-02-01', 3, 3000, 'ordered'),
('2024-03-01', 3, 3000, 'late'),
('2024-01-01', 4, 4000, 'ordered'),
('2024-02-01', 4, 4000, 'ordered'),
('2024-03-01', 4, 4000, 'ordered'),
('2024-01-01', 5, 5000, 'ordered'),
('2024-02-01', 5, 5000, 'late'),
('2024-03-01', 5, 5000, 'ordered'),
('2024-01-01', 6, 1000, 'dispatched'),
('2024-02-01', 6, 1000, 'dispatched'),
('2024-03-01', 6, 1000, 'dispatched'),
('2024-01-01', 7, 1000, 'late'),
('2024-02-01', 7, 1000, 'dispatched'),
('2024-03-01', 7, 1000, 'dispatched');


SELECT
  "t0"."date",
  "t0"."group",
  "t0"."count",
  "t0"."status",
  LEAD("t0"."date", 2) OVER (PARTITION BY "t0"."group" ORDER BY "t0"."date" ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS "end_date",
  LEAD("t0"."status", 2) OVER (PARTITION BY "t0"."group" ORDER BY "t0"."date" ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS "end_status"
FROM "testdata" AS "t0";

If only data for group 6 is inserted into the table, the correct output is produced.

DROP TABLE testdata;

CREATE TABLE testdata (date DATE, "group" INT, count INT, status STRING);
INSERT INTO testdata VALUES
('2024-01-01', 6, 1000, 'dispatched'),
('2024-02-01', 6, 1000, 'dispatched'),
('2024-03-01', 6, 1000, 'dispatched');

SELECT
  "t0"."date",
  "t0"."group",
  "t0"."count",
  "t0"."status",
  LEAD("t0"."date", 2) OVER (PARTITION BY "t0"."group" ORDER BY "t0"."date" ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS "end_date",
  LEAD("t0"."status", 2) OVER (PARTITION BY "t0"."group" ORDER BY "t0"."date" ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS "end_status"
FROM "testdata" AS "t0";

Output:

┌────────────┬───────┬───────┬────────────┬────────────┬────────────┐
│ date       ┆ group ┆ count ┆ status     ┆ end_date   ┆ end_status │
╞════════════╪═══════╪═══════╪════════════╪════════════╪════════════╡
│ 2024-01-01 ┆     6 ┆  1000 ┆ dispatched ┆ 2024-03-01 ┆ dispatched │
│ 2024-02-01 ┆     6 ┆  1000 ┆ dispatched ┆            ┆            │
│ 2024-03-01 ┆     6 ┆  1000 ┆ dispatched ┆            ┆            │
└────────────┴───────┴───────┴────────────┴────────────┴────────────┘

OS:

ubuntu 24.04, x86_64

DuckDB Version:

1.1.2

DuckDB Client:

python

Hardware:

No response

Full Name:

Jonathan Clarke

Affiliation:

FiveSigma

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

No - Other reason (please specify in the issue body)

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions