Skip to content

Conversation

fr3fou
Copy link
Contributor

@fr3fou fr3fou commented Nov 9, 2024

This PR adds more options to DuckDB's Python Relational API for write_parquet, matching the COPY TO options, addressing #8896:

  • partition_by
  • write_partition_columns
  • overwrite
  • per_thread_output
  • use_tmp_file
  • append

I would also like to note that the overwrite option that was added in the to_csv function (#10382) technically passes overwrite_or_ignore to the underlying engine:

if (!py::none().is(overwrite)) {
if (!py::isinstance<py::bool_>(overwrite)) {
throw InvalidInputException("to_csv only accepts 'overwrite' as a boolean");
}
options["overwrite_or_ignore"] = {Value::BOOLEAN(py::bool_(overwrite))};
}

In order to match this behavior, I've also implemented it the same way.
Changing it to pass overwrite and introducing overwrite_or_ignore as an option would be a breaking change, thus I've avoided doing it.

I've also improved the test_to_parquet tests by introducing new tests for the above mentioned flags, as well as parameterizing the Pandas engine (similar to the test_to_csv tests – using both NumpyPandas and ArrowPandas).

This PR also makes the Python stubs for {to,write}_{csv,parquet} both match, as they are technically aliases.

@duckdb-draftbot duckdb-draftbot marked this pull request as draft November 9, 2024 12:16
@fr3fou fr3fou marked this pull request as ready for review November 9, 2024 12:16
Copy link
Contributor

@Tishj Tishj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@Mytherin Mytherin merged commit 1aa2a7c into duckdb:main Nov 11, 2024
19 checks passed
@fr3fou fr3fou deleted the python-api-write-options branch November 11, 2024 09:04
github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request Dec 21, 2024
Add operator name to profiling output (duckdb/duckdb#14744)
Add missing global options to Python's `write_parquet` (duckdb/duckdb#14766)
github-actions bot added a commit to duckdb/duckdb-r that referenced this pull request Dec 21, 2024
Add operator name to profiling output (duckdb/duckdb#14744)
Add missing global options to Python's `write_parquet` (duckdb/duckdb#14766)

Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants