Skip to content

COLUMNS(table_name.*) is broken when there are joins #9867

@code1704

Description

@code1704

What happens?

If our SQL query has JOINs, COLUMNS(table_name.*) selects all columns from the joined set of columns instead of only the columns from table_name.

To Reproduce

import duckdb
import pandas as pd

df1 = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3', 'K4', 'K5'],
                    'A': [11, 12, 13, 14, 15, 16],
                    'B': [21, 22, 23, 24, 25, 26]})

df2 = pd.DataFrame({'key': ['K0', 'K2', 'K5'],
                    'C': [2, 3, 5]})

df = duckdb.query("""
    select sin(columns(df1.* exclude (key))) from df1 join df2 using(key)
""").df()
print(df)

It produces:

   sin(df1.A)  sin(df1.B)  sin(df2.C)
0   -0.999990    0.836656    0.909297
1    0.420167   -0.846220    0.141120
2   -0.287903    0.762558   -0.958924

However, the results shouldn't have any columns from df2.

OS:

Ubuntu 22.04.2 LTS, aarch64

DuckDB Version:

0.9.2

DuckDB Client:

Python

Full Name:

Mark

Affiliation:

bq

Have you tried this on the latest main branch?

I have not tested with any build

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions