-
Notifications
You must be signed in to change notification settings - Fork 1k
Open
Labels
enhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changelog
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Arrow has added REE support apache/arrow#14176, similar to dictionary arrays that allow repeated values to be encoded in a space efficient manner that also allows fast processing.
Describe the solution you'd like
Implement REE in arrow-rs. Some likely candidate:
- Support in DataType
- Support in ArrayData
- New REE array
- Support REE in IPC
- Support REE in cast kernels
- Support REE in compute kernels
Describe alternatives you've considered
Remaining tasks:
- arrow-row: Add support for REE #7649
- arrow-select: Implement concat for
RunArray
s #7487 - arrow-data: Add REE support for
build_extend
andbuild_extend_nulls
#7671 - Implement
PartialEq
for RunArray #7691 - Reduce repetition in tests for arrow-row/src/run.rs #7692
- Improve performance of RunArray --> Row conversion #7693
- Potential Optimization for interleave/take on RunEndEncoded arrays #7710
- Implemented casting for RunEnd Encoding #7713
Additional context
Among other things, @brancz is working to improve aggregation performance in DataFusion using Runarrays, see
stuartcarnie, kylebarron, suremarc, asubiotto and vegarsti
Metadata
Metadata
Assignees
Labels
enhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changelog