Skip to content

Conversation

Mytherin
Copy link
Collaborator

@Mytherin Mytherin commented Sep 21, 2024

This PR adds the SMALLER_BINARY compilation flag that can be used to reduce the binary size. In many places in DuckDB, we generate code using templating to speed up execution, by e.g. providing specialized implementations for many different primitive types, or providing specialized implementations based on common compressed vector types. In many of these cases, we also have a generic fallback implementation that can be used (albeit with a small performance loss).

This PR hides many of these specialized implementations behind a #ifndef DUCKDB_SMALL_BINARY compilation flag. This allows builds where binary size is important choose to omit this extra generated code and always select the fallback implementation.

The goal is to not sacrifice any functionality so that the versions are fully compatible - but to rather achieve the same functionality in less code. In many cases the sort key functionality can be used to relatively efficiently support operations on all types without adding specialized code for all types. Since we already have this code path to deal with arbitrary types, using it in all cases is in many cases as simple as pushing an #ifdef around the specialized implementations.

This PR also has a side effect in that it improves testing of the fallback implementations - since the fallback implementation is now always used. In this process I already found two issues with the fallback implementation (one that is addressed in this PR, one that still needs to be addressed).

This PR moves the following implementations behind the DUCKDB_SMALL_BINARY flag:

  • Specialized code for flat vectors in UnaryExecutor/BinaryExecutor and DistinctFrom
  • Specialized code for executing BETWEEN in the WHERE clause (ExpressionExecutor::Select)
  • Use fallback implementation for binned_histogram, histogram, arg_min/arg_max, mode aggregates
  • Removes all window functions - instead falling back to the regular aggregate implementations when these aggregates are used in window functions

This has as effect that binary size is reduced by ~20%. We also add a CI run that runs all unit tests using the SMALLER_BINARY flag in the nightly CI.

The performance impact of the smaller binary is heavily workload dependent. For TPC-H, there is little performance impact. Simple aggregates have the potential to be more affected. For example - running an ungrouped arg_min(BIGINT, BIGINT) is slowed down by ~3x. Removing the specialized window functions might be even more impactful when they are used. That said, the significant decrease in binary size is likely worth it in the majority of cases.

@duckdb-draftbot duckdb-draftbot marked this pull request as draft September 21, 2024 13:13
@Mytherin Mytherin marked this pull request as ready for review September 21, 2024 13:14
@Mytherin Mytherin requested a review from carlopi September 21, 2024 17:58
Copy link
Contributor

@carlopi carlopi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very cool, thanks a lot.

Main comment would be whether this (possibly without the changes to tests, just to be on the safe side) can be already considered for inclusion in main branch, adding (if not already there) a big "EXPERIMENTAL" warning on the setting.
That would allow opt-in for this already in v1.1.1.


Another comment, that I am not 100% sure it makes sense, but I think it's worth raising, is whether it could make sense to add another layer so that changes in the code that go together are behind a common ifdef.

One pattern could be having:

#ifdef SMALLER_BINARY
#define SMALLER_BINARY_DISTINCT_ON
#endif

#ifdef SMALLER_BINARY_DISTICT_ON
//
#endif

or equivalent for SMALLER_BINARY_HISTOGRAM, SMALLER_BINARY_ARG_MIN_MAX, ....

Idea would be that behind the generic SMALLER_BINARY the whole set of changes are enabled, but more granular choices are possible for debugging purposes (say to track down which changes are connected to a given bug) or even for allowing people distributing DuckDB to decide (or even measure) what are the tradeoffs.

Another minor benefit is that we will be keeping related changes somewhat connected.

Example of use: say for example at some point some differences are detected between code-path SMALLER_BINARY=1 and the default with no SMALLER_BINARY, having the possibility at the [C]Make level to decide more granularly what path to select can be handy in determining which areas of the code influence a change in result.

This proposed chance, if it does makes sense, can also be considered at a later point.

@Mytherin Mytherin changed the base branch from feature to main September 22, 2024 20:48
@Mytherin Mytherin changed the base branch from main to feature September 22, 2024 20:49
@Mytherin
Copy link
Collaborator Author

Let's keep this in feature for now

@Mytherin Mytherin merged commit 7ac8b3e into duckdb:feature Sep 23, 2024
65 of 70 checks passed
carlopi added a commit to carlopi/duckdb-wasm that referenced this pull request Sep 25, 2024
carlopi added a commit to carlopi/duckdb-wasm that referenced this pull request Sep 25, 2024
carlopi added a commit to carlopi/duckdb-wasm that referenced this pull request Sep 25, 2024
carlopi added a commit to carlopi/duckdb-wasm that referenced this pull request Sep 25, 2024
carlopi added a commit to duckdb/duckdb-wasm that referenced this pull request Sep 25, 2024
carlopi added a commit to duckdb/duckdb-wasm that referenced this pull request Sep 25, 2024
carlopi added a commit to duckdb/duckdb-wasm that referenced this pull request Sep 25, 2024
carlopi added a commit to duckdb/duckdb-wasm that referenced this pull request Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants