Add missing global options to Python's write_parquet
#14766
Merged
+196
−11
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds more options to DuckDB's Python Relational API for
write_parquet
, matching theCOPY TO
options, addressing #8896:partition_by
write_partition_columns
overwrite
per_thread_output
use_tmp_file
append
I would also like to note that the
overwrite
option that was added in theto_csv
function (#10382) technically passesoverwrite_or_ignore
to the underlying engine:duckdb/tools/pythonpkg/src/pyrelation.cpp
Lines 1291 to 1296 in fd5de06
In order to match this behavior, I've also implemented it the same way.
Changing it to pass
overwrite
and introducingoverwrite_or_ignore
as an option would be a breaking change, thus I've avoided doing it.I've also improved the
test_to_parquet
tests by introducing new tests for the above mentioned flags, as well as parameterizing the Pandas engine (similar to thetest_to_csv
tests – using bothNumpyPandas
andArrowPandas
).This PR also makes the Python stubs for
{to,write}_{csv,parquet}
both match, as they are technically aliases.