Skip to content

Conversation

pdet
Copy link
Collaborator

@pdet pdet commented Mar 17, 2025

As suggested in this issue, I've added tests using the same list of files that Nanoarrow currently tests with.

Some files are not yet supported, but the reasons are related to DuckDB's core Arrow integration rather than this extension.

The tests cover both the file reader/writer and the buffer reader/writer, validating all results by reading the files directly with PyArrow.

Files Not Currently Tested

generated_decimal256.stream: This type is not supported in DuckDB's core.

generated_interval.stream: DuckDB stores time as us, which causes a multiplication overflow at this precision, leading to errors.

generated_union.stream: This type is not supported in DuckDB's core.

Specifically for the Buffer Writer

generated_map_non_canonical.stream and generated_map.stream: The schema of the key is not set to NOT NULL in DuckDB's core, which later causes a validation failure.

generated_null.stream: A cast is missing in the core integration.

Otherwise, this PR mainly includes fixes to ensure all files pass correctly.

I think this was the last step before submitting this extension to the community, what do you think @paleolimbot?

Then for v1.3, I'll adjust the buffer call as described in #9

@pdet pdet requested a review from paleolimbot March 17, 2025 14:10
Copy link
Owner

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The only C++ change you might want to consider is dropping support for pre-1.0.0 IPC files and I can circle back to the Python tests to make them a bit more pytesty 🙂 .

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would benefit from a formatter (I'll circle back and add ruff to pre-commit)

@pdet
Copy link
Collaborator Author

pdet commented Mar 17, 2025

Thanks for the quick look @paleolimbot

I'll leave the Python test refactoring to you then! Will merge this later if CI passes :-).

@pdet pdet merged commit 13cacb9 into paleolimbot:main Mar 17, 2025
27 checks passed
Mytherin added a commit to duckdb/duckdb that referenced this pull request Mar 21, 2025
…write Null columns to arrow (#16711)

This PR fixes the issues described in the title and encountered in:
paleolimbot/duckdb-nanoarrow#10

Fix: duckdblabs/duckdb-internal#4451

I've tagged the merge to `v1.2` for simplicity, but this can also be
retargeted to `main`, I don't have a strong preference.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants