Only trigger TopN rewrite relatively small limits compared to the table size. #17140
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes https://github.com/duckdblabs/duckdb-internal/issues/4363
Fixes #16527
TopN isn't always faster. The below experiments show that as the table cardinality increases/as the limit increases, TopN performance ends up suffering from nLog(n) operations performed one tuple at a time, and sorting the whole table becomes faster. The plots show without optimization (i.e sorting the whole table) as having relatively constant performance, while TopN performance degrades as the limit increases.
Looking at the plots, we can see that sorting the table is constant regardless of the limit. However, once the limit value is >= (0.007)*(table_cardinality), the topn optimization takes more time that a normal sort.
The same plot with the changes from this PR that disables the TopN optimization once the limit value is >= 0.007 * table_cardinality (and the limit is > 5000).

With this PR, we now observe the following runtimes
CC @lnkuiper (I think you still have a patch you want to add?)