-
Notifications
You must be signed in to change notification settings - Fork 2.6k
fix(arrow): union buffer count & handle schema errors #11326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
You should probably look at |
Looks like there will be a single line conflict yeah |
result->n_buffers = 2; | ||
result->buffers[1] = append_data.main_buffer.data(); | ||
result->n_buffers = 1; | ||
result->buffers[0] = append_data.main_buffer.data(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the first buffer always occupied by the validity data?
I'm not sure I understand this change
//===--------------------------------------------------------------------===//
ArrowArray *ArrowAppender::FinalizeChild(const LogicalType &type, unique_ptr<ArrowAppendData> append_data_p) {
auto result = make_uniq<ArrowArray>();
auto &append_data = *append_data_p;
...
result->buffers[0] = append_data.validity.data();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All array types, with the exception of union types (more on these later), utilize a dedicated memory buffer, known as the validity (or “null”) bitmap, to encode the nullness or non-nullness of each value slot. The validity bitmap must be large enough to have at least 1 bit for each array slot.
https://arrow.apache.org/docs/format/Columnar.html#validity-bitmaps
Unlike other data types, unions do not have their own validity bitmap. Instead, the nullness of each slot is determined exclusively by the child arrays which are composed to create the union.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tripped me up as well when I wrote the original implementation, but I guess most readers are pretty forgiving (except for the arrow-rs one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah actually, yea this makes sense: https://arrow.apache.org/docs/format/Columnar.html#buffer-listing-for-each-layout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for fixing this!
Thanks! |
Merge pull request duckdb/duckdb#11326 from Mause/bugfix/arrow-union
Fixes two small issues I found while adding nested type support to duckdb-rs