Skip to content

Doing HASH_JOIN instead of SEQ_SCAN even when driving table has single record #4974

@SartorialOffense

Description

@SartorialOffense

What happens?

When joining 2 single column tables A and B, the optimizer is picking a HASH_JOIN even when it knows the cardinality of the driving table is as low as 1 (it does short circuit if it is zero). Is there anyway to prevent this and use a more targeted plan for cases like this?

And thanks again for this fantastic tool!

To Reproduce

create table data as select cast(i as integer) a from range(1, 50000000, 1) t(i);

pragma enable_profiling;

--does a SEQ_SCAN of data in <1 ms
select * from data where a = 1;

pragma disable_profiling;

create table query as select a from data where false;

insert into query values(1);

pragma enable_profiling;

--does a HASH_JOIN in 170ms
select data.* from query join data on query.a = data.a;

--does a HASH_JOIN in 300ms?
select data.* from (VALUES (1)) AS query(A) join data on query.a = data.a;

OS:

Windows 10

DuckDB Version:

0.5.2-dev742

DuckDB Client:

CLI

Full Name:

Jonathan Trumbull

Affiliation:

AbbVie

Have you tried this on the latest master branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions