Skip to content

Segfault With Copy + Positional Join with Parquet Version V2 (not V1) #17682

@J-Meyers

Description

@J-Meyers

What happens?

Duckdb crashes with a Segmentation fault when trying to COPY the result of a positional join with PARQUET VERSION V2, but not V1

To Reproduce

I have been unable to generate synthetic data that is equivalent, so here is my trimmed down actual data in csv form, each just one column, there is some size component to this crash, if I limit to < 1000 data points it doesn't appear, also it crashes on my real data, but doesn't when using a simple unnest(generate_series

tbl1.csv

tbl2.csv

This works

COPY (
 SELECT * FROM 'tbl1.csv'
 POSITIONAL JOIN (FROM 'tbl2.csv')
 ) TO 'test_out.parquet'
 (PARQUET_VERSION V1);

This doesn't:

COPY (
 SELECT * FROM 'tbl1.csv'
 POSITIONAL JOIN (FROM 'tbl2.csv')
 ) TO 'test_out.parquet'
 (PARQUET_VERSION V2);

When running roughly equivalent of the above from python with ray I also see PC: @ 0x7f47e9339b0d (unknown) duckdb::StandardColumnWriter<>::WriteVectorInternal<>()

Backtrace:

Image

OS:

Linux x86_64

DuckDB Version:

1.3.0

DuckDB Client:

CLI

Hardware:

No response

Full Name:

Julian Meyers

Affiliation:

Personal Use / Advanced Robotics Group (when working)

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a source build

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions