-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Closed
Labels
Description
What happens?
A 'LEAD(colname, 2)' operation over a simple partition produces incorrect results for one partition
Example using DuckDB live here
Tested on ubuntu and WASM
The output of the code below produces incorrect output for group 6 on date 2024-01-01 (row 4). DuckDb produces:
┌────────────┬───────┬───────┬────────────┬────────────┬────────────┐
│ date ┆ group ┆ count ┆ status ┆ end_date ┆ end_status │
╞════════════╪═══════╪═══════╪════════════╪════════════╪════════════╡
│ 2024-01-01 ┆ 3 ┆ 3000 ┆ ordered ┆ 2024-03-01 ┆ late │
│ 2024-02-01 ┆ 3 ┆ 3000 ┆ ordered ┆ ┆ │
│ 2024-03-01 ┆ 3 ┆ 3000 ┆ late ┆ ┆ │
│ 2024-01-01 ┆ 6 ┆ 1000 ┆ dispatched ┆ ┆ │
│ 2024-02-01 ┆ 6 ┆ 1000 ┆ dispatched ┆ ┆ │
│ 2024-03-01 ┆ 6 ┆ 1000 ┆ dispatched ┆ ┆ │
│ 2024-01-01 ┆ 4 ┆ 4000 ┆ ordered ┆ 2024-03-01 ┆ ordered │
│ 2024-02-01 ┆ 4 ┆ 4000 ┆ ordered ┆ ┆ │
│ 2024-03-01 ┆ 4 ┆ 4000 ┆ ordered ┆ ┆ │
│ 2024-01-01 ┆ 1 ┆ 1000 ┆ ordered ┆ 2024-03-01 ┆ dispatched │
│ 2024-02-01 ┆ 1 ┆ 1000 ┆ dispatched ┆ ┆ │
│ 2024-03-01 ┆ 1 ┆ 1000 ┆ dispatched ┆ ┆ │
│ 2024-01-01 ┆ 7 ┆ 1000 ┆ late ┆ 2024-03-01 ┆ dispatched │
│ 2024-02-01 ┆ 7 ┆ 1000 ┆ dispatched ┆ ┆ │
│ 2024-03-01 ┆ 7 ┆ 1000 ┆ dispatched ┆ ┆ │
│ 2024-01-01 ┆ 2 ┆ 2000 ┆ ordered ┆ 2024-03-01 ┆ ordered │
│ 2024-02-01 ┆ 2 ┆ 2000 ┆ ordered ┆ ┆ │
│ 2024-03-01 ┆ 2 ┆ 2000 ┆ ordered ┆ ┆ │
│ 2024-01-01 ┆ 5 ┆ 5000 ┆ ordered ┆ 2024-03-01 ┆ ordered │
│ 2024-02-01 ┆ 5 ┆ 5000 ┆ late ┆ ┆ │
│ 2024-03-01 ┆ 5 ┆ 5000 ┆ ordered ┆ ┆ │
└────────────┴───────┴───────┴────────────┴────────────┴────────────┘
Correct output for that row is:
2024-01-01 ┆ 6 ┆ 1000 ┆ dispatched ┆ 2024-03-01 ┆ dispatched
To Reproduce
following code produces the error:
CREATE TABLE testdata (date DATE, "group" INT, count INT, status STRING);
INSERT INTO testdata VALUES
('2024-01-01', 1, 1000, 'ordered'),
('2024-02-01', 1, 1000, 'dispatched'),
('2024-03-01', 1, 1000, 'dispatched'),
('2024-01-01', 2, 2000, 'ordered'),
('2024-02-01', 2, 2000, 'ordered'),
('2024-03-01', 2, 2000, 'ordered'),
('2024-01-01', 3, 3000, 'ordered'),
('2024-02-01', 3, 3000, 'ordered'),
('2024-03-01', 3, 3000, 'late'),
('2024-01-01', 4, 4000, 'ordered'),
('2024-02-01', 4, 4000, 'ordered'),
('2024-03-01', 4, 4000, 'ordered'),
('2024-01-01', 5, 5000, 'ordered'),
('2024-02-01', 5, 5000, 'late'),
('2024-03-01', 5, 5000, 'ordered'),
('2024-01-01', 6, 1000, 'dispatched'),
('2024-02-01', 6, 1000, 'dispatched'),
('2024-03-01', 6, 1000, 'dispatched'),
('2024-01-01', 7, 1000, 'late'),
('2024-02-01', 7, 1000, 'dispatched'),
('2024-03-01', 7, 1000, 'dispatched');
SELECT
"t0"."date",
"t0"."group",
"t0"."count",
"t0"."status",
LEAD("t0"."date", 2) OVER (PARTITION BY "t0"."group" ORDER BY "t0"."date" ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS "end_date",
LEAD("t0"."status", 2) OVER (PARTITION BY "t0"."group" ORDER BY "t0"."date" ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS "end_status"
FROM "testdata" AS "t0";
If only data for group 6 is inserted into the table, the correct output is produced.
DROP TABLE testdata;
CREATE TABLE testdata (date DATE, "group" INT, count INT, status STRING);
INSERT INTO testdata VALUES
('2024-01-01', 6, 1000, 'dispatched'),
('2024-02-01', 6, 1000, 'dispatched'),
('2024-03-01', 6, 1000, 'dispatched');
SELECT
"t0"."date",
"t0"."group",
"t0"."count",
"t0"."status",
LEAD("t0"."date", 2) OVER (PARTITION BY "t0"."group" ORDER BY "t0"."date" ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS "end_date",
LEAD("t0"."status", 2) OVER (PARTITION BY "t0"."group" ORDER BY "t0"."date" ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS "end_status"
FROM "testdata" AS "t0";
Output:
┌────────────┬───────┬───────┬────────────┬────────────┬────────────┐
│ date ┆ group ┆ count ┆ status ┆ end_date ┆ end_status │
╞════════════╪═══════╪═══════╪════════════╪════════════╪════════════╡
│ 2024-01-01 ┆ 6 ┆ 1000 ┆ dispatched ┆ 2024-03-01 ┆ dispatched │
│ 2024-02-01 ┆ 6 ┆ 1000 ┆ dispatched ┆ ┆ │
│ 2024-03-01 ┆ 6 ┆ 1000 ┆ dispatched ┆ ┆ │
└────────────┴───────┴───────┴────────────┴────────────┴────────────┘
OS:
ubuntu 24.04, x86_64
DuckDB Version:
1.1.2
DuckDB Client:
python
Hardware:
No response
Full Name:
Jonathan Clarke
Affiliation:
FiveSigma
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
No - Other reason (please specify in the issue body)
Did you include all code required to reproduce the issue?
- Yes, I have
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
- Yes, I have