Add `SMALLER_BINARY` flag to reduce binary size #14057

Mytherin · 2024-09-21T11:06:02Z

This PR adds the SMALLER_BINARY compilation flag that can be used to reduce the binary size. In many places in DuckDB, we generate code using templating to speed up execution, by e.g. providing specialized implementations for many different primitive types, or providing specialized implementations based on common compressed vector types. In many of these cases, we also have a generic fallback implementation that can be used (albeit with a small performance loss).

This PR hides many of these specialized implementations behind a #ifndef DUCKDB_SMALL_BINARY compilation flag. This allows builds where binary size is important choose to omit this extra generated code and always select the fallback implementation.

The goal is to not sacrifice any functionality so that the versions are fully compatible - but to rather achieve the same functionality in less code. In many cases the sort key functionality can be used to relatively efficiently support operations on all types without adding specialized code for all types. Since we already have this code path to deal with arbitrary types, using it in all cases is in many cases as simple as pushing an #ifdef around the specialized implementations.

This PR also has a side effect in that it improves testing of the fallback implementations - since the fallback implementation is now always used. In this process I already found two issues with the fallback implementation (one that is addressed in this PR, one that still needs to be addressed).

This PR moves the following implementations behind the DUCKDB_SMALL_BINARY flag:

Specialized code for flat vectors in UnaryExecutor/BinaryExecutor and DistinctFrom
Specialized code for executing BETWEEN in the WHERE clause (ExpressionExecutor::Select)
Use fallback implementation for binned_histogram, histogram, arg_min/arg_max, mode aggregates
Removes all window functions - instead falling back to the regular aggregate implementations when these aggregates are used in window functions

This has as effect that binary size is reduced by ~20%. We also add a CI run that runs all unit tests using the SMALLER_BINARY flag in the nightly CI.

The performance impact of the smaller binary is heavily workload dependent. For TPC-H, there is little performance impact. Simple aggregates have the potential to be more affected. For example - running an ungrouped arg_min(BIGINT, BIGINT) is slowed down by ~3x. Removing the specialized window functions might be even more impactful when they are used. That said, the significant decrease in binary size is likely worth it in the majority of cases.

… generic implementation

This reverts commit 714ab2e.

carlopi

This is very cool, thanks a lot.

Main comment would be whether this (possibly without the changes to tests, just to be on the safe side) can be already considered for inclusion in main branch, adding (if not already there) a big "EXPERIMENTAL" warning on the setting.
That would allow opt-in for this already in v1.1.1.

Another comment, that I am not 100% sure it makes sense, but I think it's worth raising, is whether it could make sense to add another layer so that changes in the code that go together are behind a common ifdef.

One pattern could be having:

#ifdef SMALLER_BINARY
#define SMALLER_BINARY_DISTINCT_ON
#endif

#ifdef SMALLER_BINARY_DISTICT_ON
//
#endif

or equivalent for SMALLER_BINARY_HISTOGRAM, SMALLER_BINARY_ARG_MIN_MAX, ....

Idea would be that behind the generic SMALLER_BINARY the whole set of changes are enabled, but more granular choices are possible for debugging purposes (say to track down which changes are connected to a given bug) or even for allowing people distributing DuckDB to decide (or even measure) what are the tradeoffs.

Another minor benefit is that we will be keeping related changes somewhat connected.

Example of use: say for example at some point some differences are detected between code-path SMALLER_BINARY=1 and the default with no SMALLER_BINARY, having the possibility at the [C]Make level to decide more granularly what path to select can be handy in determining which areas of the code influence a change in result.

This proposed chance, if it does makes sense, can also be considered at a later point.

Mytherin · 2024-09-23T07:57:13Z

Let's keep this in feature for now

From duckdb/duckdb#14057

Mytherin added 16 commits September 20, 2024 15:44

Add SMALLER_BINARY flag that produces a smaller DuckDB binary

91f8065

No between select in smaller binary mode

84a2376

Always take generic path in binary executor

dd77348

Remove unary flat update in unary executor

e82f0ae

Remove flat optimizations in unary aggregates

6793b30

Remove templated code for HAS_NULL/HAS_TRUE_SEL/HAS_FALSE_SEL

5b6f190

arg_min_max: fallback to generic implementation in small binary mode

0808555

mode: use generic function in smaller binary mode

4ada1b8

Remove window functions from quantile in smaller binary mode, and use…

28d2314

… generic implementation

Use generic histogram implementation in smaller binary mode

7d7f0b1

Format fix

e944ecf

binned_histogram: use generic function in smaller binary mode

caa2716

Remove interval specializations

714ab2e

Revert "Remove interval specializations"

303b041

This reverts commit 714ab2e.

Add smaller binary CI run

a2136c3

Reduce disk space requirement in test for CI

f6f1f9b

duckdb-draftbot marked this pull request as draft September 21, 2024 13:13

Mytherin marked this pull request as ready for review September 21, 2024 13:14

Mytherin requested a review from carlopi September 21, 2024 17:58

carlopi approved these changes Sep 22, 2024

View reviewed changes

Mytherin changed the base branch from feature to main September 22, 2024 20:48

Mytherin changed the base branch from main to feature September 22, 2024 20:49

Mytherin merged commit 7ac8b3e into duckdb:feature Sep 23, 2024
65 of 70 checks passed

carlopi added a commit to carlopi/duckdb-wasm that referenced this pull request Sep 25, 2024

SMALLER_BINARY: Add smaller_casts.patch

f85ab07

From duckdb/duckdb#14057

carlopi added a commit to carlopi/duckdb-wasm that referenced this pull request Sep 25, 2024

SMALLER_BINARY: Add is_distinct_from.patch

f59f90f

From duckdb/duckdb#14057

carlopi added a commit to carlopi/duckdb-wasm that referenced this pull request Sep 25, 2024

SMALLER_BINARY: Add binary_executor.patch

3199db7

From duckdb/duckdb#14057

carlopi added a commit to carlopi/duckdb-wasm that referenced this pull request Sep 25, 2024

SMALLER_BINARY: Add unary_executor.patch

64c96f6

From duckdb/duckdb#14057

carlopi mentioned this pull request Sep 25, 2024

Add patches from SMALLER_BINARY PR duckdb/duckdb-wasm#1864

Merged

carlopi added a commit to duckdb/duckdb-wasm that referenced this pull request Sep 25, 2024

SMALLER_BINARY: Add smaller_casts.patch

08ca5da

From duckdb/duckdb#14057

carlopi added a commit to duckdb/duckdb-wasm that referenced this pull request Sep 25, 2024

SMALLER_BINARY: Add is_distinct_from.patch

ccbefd7

From duckdb/duckdb#14057

carlopi added a commit to duckdb/duckdb-wasm that referenced this pull request Sep 25, 2024

SMALLER_BINARY: Add binary_executor.patch

9beae19

From duckdb/duckdb#14057

carlopi added a commit to duckdb/duckdb-wasm that referenced this pull request Sep 25, 2024

SMALLER_BINARY: Add unary_executor.patch

a4ecade

From duckdb/duckdb#14057

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add `SMALLER_BINARY` flag to reduce binary size #14057

Add `SMALLER_BINARY` flag to reduce binary size #14057

Uh oh!

Mytherin commented Sep 21, 2024 •

edited

Loading

Uh oh!

carlopi left a comment

Uh oh!

Mytherin commented Sep 23, 2024

Uh oh!

Uh oh!

Uh oh!

Add SMALLER_BINARY flag to reduce binary size #14057

Add SMALLER_BINARY flag to reduce binary size #14057

Uh oh!

Conversation

Mytherin commented Sep 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

carlopi left a comment

Choose a reason for hiding this comment

Uh oh!

Mytherin commented Sep 23, 2024

Uh oh!

Uh oh!

Uh oh!

Add `SMALLER_BINARY` flag to reduce binary size #14057

Add `SMALLER_BINARY` flag to reduce binary size #14057

Mytherin commented Sep 21, 2024 •

edited

Loading