Implement `DELTA_BINARY_PACKED` compression in Parquet writer #14257

lnkuiper · 2024-10-07T13:18:49Z

As the title explains, this PR implements DELTA_BINARY_PACKED compression for the Parquet writer. I've reused our own bitpacking code to achieve this. I've reworked the reader to also use our own bitpacking code. The compression ratio for integral columns should be much better now.

I've added this on a branch where I was enabling ARM Neon SIMD instructions for Snappy on AArch64. I found out that Neon is always available/enabled on AArch64! This means we can enable it in Snappy to enjoy faster (de-)compression on this CPU architecture. For x86 it's much harder to enable SIMD, as it is non-portable to older CPUs. ARM is relatively young in that sense, and we can safely enable it.

lnkuiper · 2024-10-08T11:10:53Z

Regression test failures are unrelated

Mytherin · 2024-10-21T11:18:51Z

Thanks! Looks great

lnkuiper added 21 commits September 13, 2024 09:12

snappy intrinsics

fc1cb80

Merge branch 'main' into snappy_stuff

5a67a6a

check if arm

32053ff

Merge branch 'main' into snappy_stuff

4ce3b51

only use neon on aarch64

ea5112f

define in both new/old version

e79c4b5

only do aarch64 stuff

f864967

Merge branch 'feature' into snappy_stuff

7c27adc

implement delta binary packed

31aadc4

use our own bitpacking code for dbp

0756d65

separate count and block count

c69edc7

more rigorous type checking for zigzag

b4e4b9f

fix assert

3ac7b6b

Merge branch 'feature' into snappy_stuff

8c90118

don't skip first value for stats

ad2dd2c

cover some edge cases

5c1ad40

some verification code for the dbp encoder and fix test

e7b0bae

cast to double to bypass compression

e30f5e5

Merge branch 'feature' into snappy_stuff

76bc0c8

reimplement dbp decoder too

0d33246

fix issues with reworked dbp decoder

21bf90b

lnkuiper added the Ready For Review label Oct 8, 2024

Mytherin mentioned this pull request Oct 10, 2024

Too large parquet files via "COPY TO" #3316

Open

Mytherin merged commit 5dd4deb into duckdb:feature Oct 21, 2024
41 of 42 checks passed

lnkuiper deleted the snappy_stuff branch April 14, 2025 09:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement `DELTA_BINARY_PACKED` compression in Parquet writer #14257

Implement `DELTA_BINARY_PACKED` compression in Parquet writer #14257

Uh oh!

lnkuiper commented Oct 7, 2024

Uh oh!

lnkuiper commented Oct 8, 2024

Uh oh!

Uh oh!

Mytherin commented Oct 21, 2024

Uh oh!

Uh oh!

Implement DELTA_BINARY_PACKED compression in Parquet writer #14257

Implement DELTA_BINARY_PACKED compression in Parquet writer #14257

Uh oh!

Conversation

lnkuiper commented Oct 7, 2024

Uh oh!

lnkuiper commented Oct 8, 2024

Uh oh!

Uh oh!

Mytherin commented Oct 21, 2024

Uh oh!

Uh oh!

Implement `DELTA_BINARY_PACKED` compression in Parquet writer #14257

Implement `DELTA_BINARY_PACKED` compression in Parquet writer #14257