Skip to content

Conversation

lnkuiper
Copy link
Contributor

@lnkuiper lnkuiper commented Nov 18, 2022

This PR implements parallel scanning of the result of the ORDER_BY operator. Preserving the order while reading sorted data has been made possible by adding batch indices to local scan states. It was very easy to implement with all this infrastructure in place. Thanks, @Mytherin!

Quick benchmark, of course.

CALL dbgen(sf=1);
CREATE TEMPORARY TABLE test AS SELECT * FROM lineitem ORDER BY l_extendedprice;

Timing in seconds (on my laptop):

Old New
2.117 0.845

As we can see, sorted tables can be created much more quickly, which can help with zone maps, for example.

This also makes it more enticing to use the PhysicalOrder operator for index creation (CC @taniabogatsch)

Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Looks good to me. Could we add some tests, particularly around verifying that the results are actually stored in the correct order, also with persistent storage?

One way of testing this is to use window functions without an explicit order by, e.g.:

create table integers(i integer);
insert into integers values (1), (2), (3);
-- check that either the row is the first row, or the value of "i" is bigger than the last row
-- this should return an empty result because of the NOT
select * FROM integers QUALIFY NOT (i>=lag(i) over () OR lag(i) OVER () IS NULL);

@Mytherin Mytherin changed the base branch from master to feature November 19, 2022 15:45
@lnkuiper
Copy link
Contributor Author

This is ready to go, I will push the bugfix we discussed to master

@Mytherin
Copy link
Collaborator

Thanks!

@lnkuiper lnkuiper deleted the parallel_orderby_getdata branch May 1, 2023 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants