Optimise division by a constant at runtime for integer division #10348

JAicewizard · 2024-01-25T22:25:18Z

When the rhs of an integer divide is known to be constant, it is possible to optimize this. Libdivide does this optimization at runtime, so even when the constant isn't known at compile time, it can still perform similar optimizations.

As you can see below the code using libdivide runs up to twice as fast. It should be even faster for unsigned integers.
It might be possible for the compiler to auto-vectorise the branchless version of libdivide, but I have not looked into that. Open for future research.

Issues with this code

I personally find the current method a bit messy, as it is the responsibility of the OPWRAPPER to decide what to optimize and what not. On one hand this allows the OP to be as generic as it was before. On the other hand, adding this libdivide optimization to more types will result in lots of duplicate code across the wrappers.

Benchmarks (ran with turbo-boost disabled)

No optimisation:

name	run	timing
benchmark/micro/arithmetic/division_constrhs.benchmark	1	1.170270
benchmark/micro/arithmetic/division_constrhs.benchmark	2	1.173304
benchmark/micro/arithmetic/division_constrhs.benchmark	3	1.164200
benchmark/micro/arithmetic/division_constrhs.benchmark	4	1.179625
benchmark/micro/arithmetic/division_constrhs.benchmark	5	1.177252

Just performing row validity check based on the rhs (if possible):

name	run	timing
benchmark/micro/arithmetic/division_constrhs.benchmark	1	1.113453
benchmark/micro/arithmetic/division_constrhs.benchmark	2	1.130460
benchmark/micro/arithmetic/division_constrhs.benchmark	3	1.125331
benchmark/micro/arithmetic/division_constrhs.benchmark	4	1.134362
benchmark/micro/arithmetic/division_constrhs.benchmark	5	1.120623

Using libdivide:

name	run	timing
benchmark/micro/arithmetic/division_constrhs.benchmark	1	0.498989
benchmark/micro/arithmetic/division_constrhs.benchmark	2	0.497163
benchmark/micro/arithmetic/division_constrhs.benchmark	3	0.498707
benchmark/micro/arithmetic/division_constrhs.benchmark	4	0.504418
benchmark/micro/arithmetic/division_constrhs.benchmark	5	0.502800

JAicewizard · 2024-01-25T22:40:19Z

Builds are all failing seemingly because of formatting issues.
As mentioned the biggest issue I have with the code is that the responsibility of optimization is with the wrapper not the operation itself. As I am not very familiar with c++ templates, I don't really know how to improve this.
However if this isn't an issue on your side, I can remove the WIP and fix the formatting issues.

Similar optimisations can be done for the modulo operator. I will file a seperate PR for that once this is merged.

Tagging @lnkuiper since I already mentioned this to him on monday

Mytherin

Thanks for the PR! Great performance results.

Can we create a separate divide_by_const function, and then rewrite x // C to divide_by_const(x, C) as an optimization in the ArithmeticSimplificationRule, instead of doing this optimization within the division operator?
We avoid explicit SIMD instructions in DuckDB - and libdivide seems to contain many of them. In general it seems like a very complex library for what should be a relatively simple optimization. I wonder if we could either strip down libdivide, or switch to something like fastmod which seems to be a lot more simple.

JAicewizard · 2024-01-26T19:57:30Z

Can we create a separate divide_by_const function, and then rewrite x // C to divide_by_const(x, C) as an optimization in the ArithmeticSimplificationRule, instead of doing this optimization within the division operator?

I don't know! I don't know a lot of the internals of duckdb. I can write a function that does this, but I am not sure I can also do the optimization part of it. This does however seam like a much cleaner solution.

We avoid explicit SIMD instructions in DuckDB - and libdivide seems to contain many of them. In general it seems like a very complex library for what should be a relatively simple optimization. I wonder if we could either strip down libdivide, or switch to something like fastmod which seems to be a lot more simple.

I havn't yet tested fastmod, that was the library I was intending to use for the follow-up using modulo. One advantage of this library is that it also provides branchless versions, which may prove advantageous for automatic vectorisation. I also don't know the performance of fastmod for division
All the explicit vectorisation can be removed I think, I will look into that.

Mytherin · 2024-01-27T09:22:52Z

Can we create a separate divide_by_const function, and then rewrite x // C to divide_by_const(x, C) as an optimization in the ArithmeticSimplificationRule, instead of doing this optimization within the division operator?

I don't know! I don't know a lot of the internals of duckdb. I can write a function that does this, but I am not sure I can also do the optimization part of it. This does however seam like a much cleaner solution.

Have a look here, I think the rewrite should be relatively straightforward.

JAicewizard · 2024-01-28T12:54:37Z

I implemented it as a function instead of an operator. I wasn't entirely sure where to put it, but it can easily be moved of course.

I also looked into using fastmod for division, however it only supports 32 and 64 bit integers, and doesnt support 64 bit division on MSVC at all. I also measured the performance, and it is significantly slower.

src/core_functions/scalar/math/numeric.cpp

JAicewizard · 2024-01-31T14:13:26Z

I moved this PR to use fastmod. This reduced the available types this optimization applies to, to uint32_t, but performance is around the same.

JAicewizard · 2024-02-14T12:31:39Z

I reimplemented fastmod using the duckdb 128 bit types to allow unsigned as well as signed 32 bit execution. It is even possible to implement the unsigned 64 bit variant using this.

To get the performance somewhat good I needed the fast-path of the hugeint multiply to be in the header, as otherwise the compiler cant optimize this. I left the slow path in the c++ file, but in case of a modern gcc or clang compiler, calling the multiply function will use the optimized bath and be inlined. This makes the multiply a lot faster (if used directly, not via *).

If this looks good I can easily implement this for uint64 and (u)int{8/16}

JAicewizard · 2024-08-24T09:11:52Z

@Mytherin @lnkuiper It's been a while, but I re-based and all the test now pass (not sure why they failed before). Are there any more changes I should make?

Mytherin · 2024-08-26T07:30:26Z

Thanks for the changes! I think this is actually ready to merge, @lnkuiper thoughts?

lnkuiper

@Mytherin I agree, this looks good!

JAicewizard · 2024-08-26T12:21:49Z

Thanks for the merge!

github-actions bot marked this pull request as draft January 26, 2024 11:43

JAicewizard marked this pull request as ready for review January 26, 2024 11:56

JAicewizard changed the title ~~WIP: Optimise division by a constant at runtime for integer division~~ Optimise division by a constant at runtime for integer division Jan 26, 2024

github-actions bot marked this pull request as draft January 26, 2024 12:03

Mytherin reviewed Jan 26, 2024

View reviewed changes

Mytherin added the Changes Requested label Jan 26, 2024

JAicewizard force-pushed the optimise_intdiv branch 2 times, most recently from 355b92e to fb06827 Compare January 28, 2024 13:11

JAicewizard marked this pull request as ready for review January 28, 2024 13:12

github-actions bot marked this pull request as draft January 28, 2024 21:08

JAicewizard marked this pull request as ready for review January 28, 2024 21:08

github-actions bot marked this pull request as draft January 28, 2024 21:15

JAicewizard marked this pull request as ready for review January 28, 2024 21:16

github-actions bot marked this pull request as draft January 28, 2024 21:21

JAicewizard marked this pull request as ready for review January 28, 2024 21:22

xuke-hat reviewed Jan 30, 2024

View reviewed changes

src/core_functions/scalar/math/numeric.cpp Outdated Show resolved Hide resolved

github-actions bot marked this pull request as draft January 31, 2024 14:09

JAicewizard force-pushed the optimise_intdiv branch from bf4f550 to aef45de Compare January 31, 2024 14:14

JAicewizard force-pushed the optimise_intdiv branch from 9a5c7b0 to 229b309 Compare February 14, 2024 11:33

JAicewizard requested a review from Mytherin February 14, 2024 12:31

JAicewizard marked this pull request as ready for review February 14, 2024 12:32

github-actions bot marked this pull request as draft February 14, 2024 12:41

JAicewizard force-pushed the optimise_intdiv branch from f89cb84 to 4ff7d3a Compare February 14, 2024 12:54

JAicewizard marked this pull request as ready for review February 14, 2024 12:55

JAicewizard added 2 commits August 23, 2024 20:07

Rename struct

36ac0e8

Remove more duplicate includes

e78d458

JAicewizard force-pushed the optimise_intdiv branch from 5bf7174 to e78d458 Compare August 23, 2024 18:18

duckdb-draftbot marked this pull request as draft August 23, 2024 18:18

JAicewizard marked this pull request as ready for review August 23, 2024 18:24

duckdb-draftbot marked this pull request as draft August 23, 2024 18:36

JAicewizard marked this pull request as ready for review August 23, 2024 18:36

JAicewizard force-pushed the optimise_intdiv branch from 525a8f1 to 5e19280 Compare August 23, 2024 18:42

duckdb-draftbot marked this pull request as draft August 23, 2024 18:42

JAicewizard marked this pull request as ready for review August 23, 2024 18:42

JAicewizard force-pushed the optimise_intdiv branch from 5e19280 to 0651b4d Compare August 23, 2024 18:50

duckdb-draftbot marked this pull request as draft August 23, 2024 18:50

JAicewizard marked this pull request as ready for review August 23, 2024 18:51

JAicewizard force-pushed the optimise_intdiv branch from 0651b4d to e040554 Compare August 23, 2024 18:55

duckdb-draftbot marked this pull request as draft August 23, 2024 18:55

JAicewizard marked this pull request as ready for review August 23, 2024 18:55

??

0f05882

JAicewizard force-pushed the optimise_intdiv branch from e040554 to 0f05882 Compare August 23, 2024 19:11

duckdb-draftbot marked this pull request as draft August 23, 2024 19:12

JAicewizard marked this pull request as ready for review August 23, 2024 19:20

Mytherin changed the base branch from main to feature August 26, 2024 07:30

Mytherin added feature Ready For Review and removed Changes Requested labels Aug 26, 2024

lnkuiper approved these changes Aug 26, 2024

View reviewed changes

Mytherin merged commit c447522 into duckdb:feature Aug 26, 2024
43 checks passed

JAicewizard deleted the optimise_intdiv branch August 26, 2024 12:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimise division by a constant at runtime for integer division #10348

Optimise division by a constant at runtime for integer division #10348

Uh oh!

JAicewizard commented Jan 25, 2024

Uh oh!

JAicewizard commented Jan 25, 2024

Uh oh!

Mytherin left a comment

Uh oh!

JAicewizard commented Jan 26, 2024

Uh oh!

Mytherin commented Jan 27, 2024

Uh oh!

JAicewizard commented Jan 28, 2024

Uh oh!

Uh oh!

JAicewizard commented Jan 31, 2024

Uh oh!

JAicewizard commented Feb 14, 2024

Uh oh!

JAicewizard commented Aug 24, 2024

Uh oh!

Mytherin commented Aug 26, 2024

Uh oh!

lnkuiper left a comment

Uh oh!

Uh oh!

JAicewizard commented Aug 26, 2024

Uh oh!

Uh oh!

Optimise division by a constant at runtime for integer division #10348

Optimise division by a constant at runtime for integer division #10348

Uh oh!

Conversation

JAicewizard commented Jan 25, 2024

Issues with this code

Benchmarks (ran with turbo-boost disabled)

Uh oh!

JAicewizard commented Jan 25, 2024

Uh oh!

Mytherin left a comment

Choose a reason for hiding this comment

Uh oh!

JAicewizard commented Jan 26, 2024

Uh oh!

Mytherin commented Jan 27, 2024

Uh oh!

JAicewizard commented Jan 28, 2024

Uh oh!

Uh oh!

JAicewizard commented Jan 31, 2024

Uh oh!

JAicewizard commented Feb 14, 2024

Uh oh!

JAicewizard commented Aug 24, 2024

Uh oh!

Mytherin commented Aug 26, 2024

Uh oh!

lnkuiper left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JAicewizard commented Aug 26, 2024

Uh oh!

Uh oh!