feat(array): update String array type #5545
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Update flux's String array type to either be an arrow Binary array or an arrow Dictionary with Binary values. This removes the non-arrow compatible single string variant, instead using a dictionary to provide the low memory version for repeated values. A dictionary provides a more general purpose implementation of the same idea to not keep repeating identical values.
The StringBuilder still swaps to a standard Binary array after a second unique value of a String is observed, but does not do so just because a NULL is added to the array. In the future the heuristic could be changed to provide memory efficient string representations in other contexts.
Moving to a completely arrow-compatible interface makes the String array type much less fragile. It is now possible to use the String array in any context that an arrow Array can be used, and removes the special-case code previously required to split a String array.
Checklist
Dear Author 👋, the following checks should be completed (or explicitly dismissed) before merging.
experimental/
docs/Spec.md
has been updatedDear Reviewer(s) 👋, you are responsible (among others) for ensuring the completeness and quality of the above before approval.