Skip to content

Conversation

lnkuiper
Copy link
Contributor

Follow-up of #16243. Scraping the bottom of the barrel here, as the previous PR got many of the biggest performance gains already.

This PR adds some more fast paths for when there are no NULLs, and implements a branchless hash function for string_t's that are inlined. This required some extra care to make sure that the hash function returns the same value whether the string is inlined or not.

Overall, the changes reduce the time it takes to write TPC-H SF10 lineitem to Parquet from ~2.6s to ~2.4s (with the default PARQUET_VERSION V1, ~2.5s to ~2.3s with V2).

@Mytherin Mytherin merged commit 219bafa into duckdb:main Feb 18, 2025
49 checks passed
@Mytherin
Copy link
Collaborator

Thanks!

Antonov548 added a commit to Antonov548/duckdb-r that referenced this pull request Mar 4, 2025
Some more Parquet writer performance improvements (duckdb/duckdb#16287)
krlmlr pushed a commit to duckdb/duckdb-r that referenced this pull request Mar 5, 2025
Some more Parquet writer performance improvements (duckdb/duckdb#16287)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 15, 2025
Some more Parquet writer performance improvements (duckdb/duckdb#16287)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 15, 2025
Some more Parquet writer performance improvements (duckdb/duckdb#16287)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 17, 2025
Some more Parquet writer performance improvements (duckdb/duckdb#16287)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 18, 2025
Some more Parquet writer performance improvements (duckdb/duckdb#16287)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants