-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Improve performance of UNNEST/UNPIVOT
by using selection vectors to unnest multiple lists at once
#16210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+193
−255
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Performance numbers of
|
2 tasks
Antonov548
added a commit
to Antonov548/duckdb-r
that referenced
this pull request
Feb 27, 2025
Improve performance of `UNNEST/UNPIVOT` by using selection vectors to unnest multiple lists at once (duckdb/duckdb#16210) [fix] Use bigobj when building with MSVC (duckdb/duckdb#16200)
krlmlr
pushed a commit
to duckdb/duckdb-r
that referenced
this pull request
Mar 5, 2025
Improve performance of `UNNEST/UNPIVOT` by using selection vectors to unnest multiple lists at once (duckdb/duckdb#16210) [fix] Use bigobj when building with MSVC (duckdb/duckdb#16200)
krlmlr
added a commit
to duckdb/duckdb-r
that referenced
this pull request
May 15, 2025
Improve performance of `UNNEST/UNPIVOT` by using selection vectors to unnest multiple lists at once (duckdb/duckdb#16210) [fix] Use bigobj when building with MSVC (duckdb/duckdb#16200)
krlmlr
added a commit
to duckdb/duckdb-r
that referenced
this pull request
May 15, 2025
Improve performance of `UNNEST/UNPIVOT` by using selection vectors to unnest multiple lists at once (duckdb/duckdb#16210) [fix] Use bigobj when building with MSVC (duckdb/duckdb#16200)
krlmlr
added a commit
to duckdb/duckdb-r
that referenced
this pull request
May 17, 2025
Improve performance of `UNNEST/UNPIVOT` by using selection vectors to unnest multiple lists at once (duckdb/duckdb#16210) [fix] Use bigobj when building with MSVC (duckdb/duckdb#16200)
krlmlr
added a commit
to duckdb/duckdb-r
that referenced
this pull request
May 18, 2025
Improve performance of `UNNEST/UNPIVOT` by using selection vectors to unnest multiple lists at once (duckdb/duckdb#16210) [fix] Use bigobj when building with MSVC (duckdb/duckdb#16200)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The current
UNNEST
implementation handles one list a time. While that works fine for large lists, we often unnest small lists. One scenario where we unnest small lists is inUNPIVOT
, where the list size is equivalent to the amount of columns we are unpivoting. For example, the query:Is translated into the below query:
This frequently leads to unnesting of small lists (in the above example lists of size 2).
The current
UNNEST
implementation handles this inefficiently because it will, in this scenario, emit vectors of at most size 2. This PR reworks theUNNEST
to handle multiple lists at once. In addition, it also gets rid of the copying thatUNNEST
would previously do by instead rewritingUNNEST
to only work using selection vectors.Effectively we traverse the input lists and construct a selection vector with the rows we need from (1) the input payload, and (2) the list entries. We then slice the input/list children with these selection vectors.
When unnesting lists of different sizes we must add
NULL
entries. We do this as a separate pass - we slice the first row of the given list vector, and then callFlatten
+ assignNULL
to the entries for which this is required. This is slower - but still does not require any special code. The flattening could be avoided if the list child hasNULL
entries itself but that is not guaranteed, so for now this optimization is skipped. I also presume that unnesting multiple lists of different lengths is probably pretty rare (but have no data to back that up).Performance
Below is the performance of the given
UNPIVOT
query running on TPC-H SF10: