Skip to content

Incorrect hashjoin_highcardinality benchmark #17521

@swaingotnochill

Description

@swaingotnochill

What happens?

Hi Team,

While running the benchmark suite, I came across this unexpected behavior:

➜  duckdb git:(main) ✗ build/release/benchmark/benchmark_runner benchmark/micro/join/hashjo
in_highcardinality.benchmark 
name    run     timing
benchmark/micro/join/hashjoin_highcardinality.benchmark 1       INCORRECT
INCORRECT RESULT: Catalog Error: Column with name v2 already exists!

====================================================
================  FAILURES SUMMARY  ================
====================================================

1: benchmark/micro/join/hashjoin_highcardinality.benchmark
name    run     timing
benchmark/micro/join/hashjoin_highcardinality.benchmark 1               INCORRECT
INCORRECT RESULT: Catalog Error: Column with name v2 already exists!

However, running the same on terminal gives the output:

I ran this on duckdb terminal, and I see the correct output:

D create table t1 as select i as v1, i as v2 from range (0, 1000) t(i);
D create table t2 as select i as v1, i as v2 from range (0, 10000000) t(i);
D select t1.v2, t2.v2, count(*) from t1 inner join t2 on (t1.v1 = t2.v1) group by t1.v2, t2
.v2 order by t1.v2 limit 5;
┌───────┬───────┬──────────────┐
│  v2   │  v2   │ count_star() │
│ int64 │ int64 │    int64     │
├───────┼───────┼──────────────┤
│     0 │     0 │            1 │
│     1 │     1 │            1 │
│     2 │     2 │            1 │
│     3 │     3 │            1 │
│     4 │     4 │            1 │
└───────┴───────┴──────────────┘

To Reproduce

  1. Trigger a Benchmark build using this: BUILD_BENCHMARK=1 BUILD_TPCH=1 make

  2. Run this command: build/release/benchmark/benchmark_runner benchmark/micro/join/hashjo in_highcardinality.benchmark

OS:

macos

DuckDB Version:

Build from master

DuckDB Client:

None

Hardware:

No response

Full Name:

Roshan Swain

Affiliation:

Individual Contributor

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a source build

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions