-
Notifications
You must be signed in to change notification settings - Fork 2.6k
[Parquet] Support for LZ4 Compression #11220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
statement error | ||
SELECT * FROM parquet_scan('data/parquet-testing/compression/generated/data_page=2_LZ4.parquet') limit 50 | ||
query IIII | ||
SELECT * FROM parquet_scan('data/parquet-testing/compression/generated/data_page=2_ZSTD.parquet', hive_partitioning=0) limit 50 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to this PR, but @samansmink this seems like a bug in the hive partitioning. Equality in file names shouldn't be auto-detected (or even interpreted) as hive partitions, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no I don't even think the filename should ever be able to contain an encoded partition, let alone be autodetected as such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine this stems from #7344 then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems to have added the work-around for these files, but file names shouldn't be considered hive partitions in any case also not when explicitly enabled
Thanks for the PR! Looks good to me - just seems like the LZ4 symbols need to be namespaced to fix the symbol leakage test |
Merge pull request duckdb/duckdb#11220 from hannes/parquetlz4
Merge pull request duckdb/duckdb#11220 from hannes/parquetlz4
Merge pull request duckdb/duckdb#11220 from hannes/parquetlz4
This PR adds support for LZ4 compression to the Parquet reader and writer. As per the Parquet compression spec, we implement only the newer
LZ4_RAW
scheme and not the deprecatedLZ4
scheme. However, we allow both variants to be used inCOPY
commands because the subtle difference is reasonably lost on users:will mean the same, and
LZ4_RAW
will be used.