Skip to content

Parquet Write value count mismatch when writing DELTA_BINARY_PACKED #16306

@xevix

Description

@xevix

What happens?

Writing out a Parquet file using COPY hits an assertion failure. Works in DuckDB 1.2, but not in main HEAD compiled 2025/02/18.

Data source: https://noaa-ghcn-pds.s3.amazonaws.com/index.html#parquet/by_year/

To Reproduce

COPY (
    SELECT * FROM read_parquet('**/*.parquet', union_by_name = true)
    WHERE year
    BETWEEN 2010 AND 2015 ORDER BY element, obs_time
)
TO 'weather_v2_zstd_2025_02_18_HEAD.parquet'
(PARQUET_VERSION V2, COMPRESSION 'zstd');
INTERNAL Error:
value count mismatch when writing DELTA_BINARY_PACKED

Stack Trace:

0        duckdb::Exception::Exception(duckdb::ExceptionType, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) + 64
1        duckdb::InternalException::InternalException(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) + 20
2        duckdb::DbpEncoder::FinishWrite(duckdb::WriteStream&) + 128
3        duckdb::StandardColumnWriter<duckdb::string_t, duckdb::string_t, duckdb::ParquetStringOperator>::FlushPageState(duckdb::WriteStream&, duckdb::ColumnWriterPageState*) + 268
4        duckdb::PrimitiveColumnWriter::FlushPage(duckdb::PrimitiveColumnWriterState&) + 124
5        duckdb::PrimitiveColumnWriter::NextPage(duckdb::PrimitiveColumnWriterState&) + 52
6        duckdb::PrimitiveColumnWriter::Write(duckdb::ColumnWriterState&, duckdb::Vector&, unsigned long long) + 204
7        duckdb::ParquetWriter::PrepareRowGroup(duckdb::ColumnDataCollection&, duckdb::PreparedRowGroup&) + 6036
8        duckdb::ParquetWritePrepareBatch(duckdb::ClientContext&, duckdb::FunctionData&, duckdb::GlobalFunctionData&, duckdb::unique_ptr<duckdb::ColumnDataCollection, std::__1::default_delete<duckdb::ColumnDataCollection>, true>) + 144
9        duckdb::PrepareBatchTask::Execute(duckdb::PhysicalBatchCopyToFile const&, duckdb::ClientContext&, duckdb::GlobalSinkState&) + 132
10       duckdb::PhysicalBatchCopyToFile::ExecuteTask(duckdb::ClientContext&, duckdb::GlobalSinkState&) const + 232
11       duckdb::ProcessRemainingBatchesTask::ExecuteTask(duckdb::TaskExecutionMode) + 32
12       duckdb::ExecutorTask::Execute(duckdb::TaskExecutionMode) + 236
13       duckdb::TaskScheduler::ExecuteForever(std::__1::atomic<bool>*) + 612
14       void* std::__1::__thread_proxy[abi:ne180100]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (*)(duckdb::TaskScheduler*, std::__1::atomic<bool>*), duckdb::TaskScheduler*, std::__1::atomic<bool>*>>(void*) + 56
15       _pthread_start + 136
16       thread_start + 8

This error signals an assertion failure within DuckDB. This usually occurs due to unexpected conditions or errors in the program's logic.
For more information, see https://duckdb.org/docs/dev/internal_errors

OS:

macOS Sequoia 15.3.1

DuckDB Version:

main/HEAD e249a40

DuckDB Client:

CLI

Hardware:

No response

Full Name:

Alejandro Wainzinger

Affiliation:

N/A

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a source build

Did you include all relevant data sets for reproducing the issue?

No - I cannot easily share my data sets due to their large size

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions