Skip to content

Conversation

Tmonster
Copy link
Contributor

Fixes https://github.com/duckdblabs/duckdb-internal/issues/4363
Fixes #16527

TopN isn't always faster. The below experiments show that as the table cardinality increases/as the limit increases, TopN performance ends up suffering from nLog(n) operations performed one tuple at a time, and sorting the whole table becomes faster. The plots show without optimization (i.e sorting the whole table) as having relatively constant performance, while TopN performance degrades as the limit increases.

Looking at the plots, we can see that sorting the table is constant regardless of the limit. However, once the limit value is >= (0.007)*(table_cardinality), the topn optimization takes more time that a normal sort.

no-top_n_opt

The same plot with the changes from this PR that disables the TopN optimization once the limit value is >= 0.007 * table_cardinality (and the limit is > 5000).
with_topn_opt

With this PR, we now observe the following runtimes

Limit This branch v1.2.2
500 0.029 0.027
5000 0.036 0.040
50000 0.103 0.124
12800000 1.116 33.019

CC @lnkuiper (I think you still have a patch you want to add?)

@Mytherin Mytherin merged commit 59ef36b into duckdb:main Apr 17, 2025
46 checks passed
@Mytherin
Copy link
Collaborator

Thanks!

@lnkuiper
Copy link
Contributor

@Tmonster My changes were merged in #17141, but I don't think the changes matter much for this PR. Maybe we should revisit when I rework sorting, but we'll cross that bridge when we get to it.

@Tmonster
Copy link
Contributor Author

I ran the test again and got the following plot. It looks like we turn off topN maybe a bit too soon? You can see for 5M, 10M and 20M the "with optimization" performance jumps straight to "without optimization". I think we can squeeze a bit more performance out in those cases
curr-main

krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 18, 2025
Only trigger TopN rewrite relatively small limits compared to the table size.  (duckdb/duckdb#17140)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 18, 2025
Only trigger TopN rewrite relatively small limits compared to the table size.  (duckdb/duckdb#17140)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 19, 2025
Only trigger TopN rewrite relatively small limits compared to the table size.  (duckdb/duckdb#17140)
Mytherin added a commit that referenced this pull request May 28, 2025
Fixes a bug introduced by #17140

The computation was wrong because it did not check
`has_estimated_cardinality` before grabbing it. This PR computes the
cardinality if it was not yet there. These fields should probably be
changed to `optional_idx` in the future so this doesn't happen.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Top-N Optimisation degrades performance for large LIMITs
3 participants