Skip to content

Conversation

Mause
Copy link
Member

@Mause Mause commented Mar 23, 2024

Fixes two small issues I found while adding nested type support to duckdb-rs

@Tishj
Copy link
Contributor

Tishj commented Mar 23, 2024

You should probably look at feature to see if there are conflicts there, we have made some changes to the Arrow code on feature

@Mause
Copy link
Member Author

Mause commented Mar 24, 2024

You should probably look at feature to see if there are conflicts there, we have made some changes to the Arrow code on feature

Looks like there will be a single line conflict yeah

@Mause Mause requested review from Mytherin and Tishj March 24, 2024 02:43
result->n_buffers = 2;
result->buffers[1] = append_data.main_buffer.data();
result->n_buffers = 1;
result->buffers[0] = append_data.main_buffer.data();
Copy link
Contributor

@Tishj Tishj Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the first buffer always occupied by the validity data?
I'm not sure I understand this change

//===--------------------------------------------------------------------===//
ArrowArray *ArrowAppender::FinalizeChild(const LogicalType &type, unique_ptr<ArrowAppendData> append_data_p) {
	auto result = make_uniq<ArrowArray>();

	auto &append_data = *append_data_p;
    ...
	result->buffers[0] = append_data.validity.data();

Copy link
Member Author

@Mause Mause Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All array types, with the exception of union types (more on these later), utilize a dedicated memory buffer, known as the validity (or “null”) bitmap, to encode the nullness or non-nullness of each value slot. The validity bitmap must be large enough to have at least 1 bit for each array slot.

https://arrow.apache.org/docs/format/Columnar.html#validity-bitmaps

Unlike other data types, unions do not have their own validity bitmap. Instead, the nullness of each slot is determined exclusively by the child arrays which are composed to create the union.

Copy link
Member Author

@Mause Mause Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tripped me up as well when I wrote the original implementation, but I guess most readers are pretty forgiving (except for the arrow-rs one)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@Tishj Tishj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for fixing this!

@Mytherin Mytherin merged commit f4f6e71 into duckdb:main Mar 25, 2024
@Mytherin
Copy link
Collaborator

Thanks!

@Mause Mause deleted the bugfix/arrow-union branch March 27, 2024 16:54
github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request Mar 28, 2024
Merge pull request duckdb/duckdb#11326 from Mause/bugfix/arrow-union
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants