Skip to content

[Python Arrow Scan] cannot join blob columns #14344

@bugzpodder

Description

@bugzpodder

What happens?

when you join by binary column you get an error, duckdb v1.1.1. works in v1.0.0

To Reproduce

import duckdb
import pyarrow as pa
import hashlib
my_table = pa.Table.from_pydict({"foo": pa.array([hashlib.sha256("foo".encode()).digest()], type=pa.binary())})
my_table2 = pa.Table.from_pydict({"foo": pa.array([hashlib.sha256("foo".encode()).digest()], type=pa.binary()),"a": ["123"]})
duckdb.sql(f"""SELECT my_table2.* EXCLUDE (foo) FROM my_table LEFT JOIN my_table2 USING (foo)""").show()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
duckdb.duckdb.Error: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb4 in position 2: invalid start byte

OS:

mac

DuckDB Version:

1..1.1

DuckDB Client:

Python

Hardware:

No response

Full Name:

Jack Zhao

Affiliation:

Delphina.ai

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions