Skip to content

Conversation

Mytherin
Copy link
Collaborator

@Mytherin Mytherin commented Apr 15, 2024

#10756 introduced a limit to the number of threads for Parquet write based on available memory and how wide the table is - 4MB per column per thread. This also turns out to be beneficial for batched writes, especially for wide tables. Below is a benchmark of copying a table with 3000 columns to a DuckDB table using various thread counts and memory limits. As shown, with a low memory limit, limiting the threads is beneficial since less data will be written to the temporary directory.

Memory Limit 1T 2T 10T
16GB 440s 913s 1014s
32GB 438s 594s 708s

@Mytherin Mytherin merged commit 076daa9 into duckdb:main Apr 15, 2024
github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request Apr 15, 2024
Merge pull request duckdb/duckdb#11655 from Mytherin/limitbatchinsertthreads
@Mytherin Mytherin deleted the limitbatchinsertthreads branch June 7, 2024 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant