-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
What happens?
When joining 2 single column tables A and B, the optimizer is picking a HASH_JOIN even when it knows the cardinality of the driving table is as low as 1 (it does short circuit if it is zero). Is there anyway to prevent this and use a more targeted plan for cases like this?
And thanks again for this fantastic tool!
To Reproduce
create table data as select cast(i as integer) a from range(1, 50000000, 1) t(i);
pragma enable_profiling;
--does a SEQ_SCAN of data in <1 ms
select * from data where a = 1;
pragma disable_profiling;
create table query as select a from data where false;
insert into query values(1);
pragma enable_profiling;
--does a HASH_JOIN in 170ms
select data.* from query join data on query.a = data.a;
--does a HASH_JOIN in 300ms?
select data.* from (VALUES (1)) AS query(A) join data on query.a = data.a;
OS:
Windows 10
DuckDB Version:
0.5.2-dev742
DuckDB Client:
CLI
Full Name:
Jonathan Trumbull
Affiliation:
AbbVie
Have you tried this on the latest master
branch?
- I agree
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
- I agree