Skip to content

Incorrect results for IN clause #14232

@milindcmodak

Description

@milindcmodak

What happens?

When compared to PostgreSQL , DuckDB is giving incorrect results. Spark SQL and PostgreSQL give 2 rows and DuckDB gives 3 rows.
PostgreSQL and Spark SQL results
t1b,8
t1c,8

DuckDB results
t1e,10
t1b,8
t1c,8

To Reproduce

-- Create table t1
CREATE TABLE t1 (
t1a TEXT,
t1b SMALLINT,
t1c INTEGER,
t1d BIGINT,
t1e REAL,
t1f DOUBLE PRECISION,
t1g NUMERIC(10,2),
t1h TIMESTAMP,
t1i DATE
);

INSERT INTO t1 (t1a, t1b, t1c, t1d, t1e, t1f, t1g, t1h, t1i) VALUES
('t1a', 6, 8, 10, 15.0, 20, 20.00, '2014-04-04 01:00:00', '2014-04-04'),
('t1b', 8, 16, 19, 17.0, 25, 26.00, '2014-05-04 01:01:00', '2014-05-04'),
('t1a', 16, 12, 21, 15.0, 20, 20.00, '2014-06-04 01:02:00.001', '2014-06-04'),
('t1a', 16, 12, 10, 15.0, 20, 20.00, '2014-07-04 01:01:00', '2014-07-04'),
('t1c', 8, 16, 19, 17.0, 25, 26.00, '2014-05-04 01:02:00.001', '2014-05-05'),
('t1d', NULL, 16, 22, 17.0, 25, 26.00, '2014-06-04 01:01:00', NULL),
('t1d', NULL, 16, 19, 17.0, 25, 26.00, '2014-07-04 01:02:00.001', NULL),
('t1e', 10, NULL, 25, 17.0, 25, 26.00, '2014-08-04 01:01:00', '2014-08-04'),
('t1e', 10, NULL, 19, 17.0, 25, 26.00, '2014-09-04 01:02:00.001', '2014-09-04'),
('t1d', 10, NULL, 12, 17.0, 25, 26.00, '2015-05-04 01:01:00', '2015-05-04'),
('t1a', 6, 8, 10, 15.0, 20, 20.00, '2014-04-04 01:02:00.001', '2014-04-04'),
('t1e', 10, NULL, 19, 17.0, 25, 26.00, '2014-05-04 01:01:00', '2014-05-04');

-- Create table t2
CREATE TABLE t2 (
t2a TEXT,
t2b SMALLINT,
t2c INTEGER,
t2d BIGINT,
t2e REAL,
t2f DOUBLE PRECISION,
t2g NUMERIC(10,2),
t2h TIMESTAMP,
t2i DATE
);

INSERT INTO t2 (t2a, t2b, t2c, t2d, t2e, t2f, t2g, t2h, t2i) VALUES
('t2a', 6, 12, 14, 15.0, 20, 20.00, '2014-04-04 01:01:00', '2014-04-04'),
('t1b', 10, 12, 19, 17.0, 25, 26.00, '2014-05-04 01:01:00', '2014-05-04'),
('t1b', 8, 16, 119, 17.0, 25, 26.00, '2015-05-04 01:01:00', '2015-05-04'),
('t1c', 12, 16, 219, 17.0, 25, 26.00, '2016-05-04 01:01:00', '2016-05-04'),
('t1b', NULL, 16, 319, 17.0, 25, 26.00, '2017-05-04 01:01:00', NULL),
('t2e', 8, NULL, 419, 17.0, 25, 26.00, '2014-06-04 01:01:00', '2014-06-04'),
('t1f', 19, NULL, 519, 17.0, 25, 26.00, '2014-05-04 01:01:00', '2014-05-04'),
('t1b', 10, 12, 19, 17.0, 25, 26.00, '2014-06-04 01:01:00', '2014-06-04'),
('t1b', 8, 16, 19, 17.0, 25, 26.00, '2014-07-04 01:01:00', '2014-07-04'),
('t1c', 12, 16, 19, 17.0, 25, 26.00, '2014-08-04 01:01:00', '2014-08-05'),
('t1e', 8, NULL, 19, 17.0, 25, 26.00, '2014-09-04 01:01:00', '2014-09-04'),
('t1f', 19, NULL, 19, 17.0, 25, 26.00, '2014-10-04 01:01:00', '2014-10-04'),
('t1b', NULL, 16, 19, 17.0, 25, 26.00, '2014-05-04 01:01:00', NULL);

SELECT t1a,
t1b
FROM t1
WHERE t1c IN (SELECT t2c
FROM t2
WHERE t1a = t2a)
GROUP BY t1a,
t1b;

OS:

x86_64

DuckDB Version:

1.1.1

DuckDB Client:

Python

Hardware:

No response

Full Name:

Rishab C

Affiliation:

Student

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Labels

    PR submittedA pull request was submitted to fix the issuereproduced

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions