Skip to content

Conversation

martinus
Copy link
Contributor

@martinus martinus commented May 6, 2018

Not sure if this is optimization is necessary, but anyway I have some spare time so here it is. This replaces the slow modulo operation with a much faster 64bit multiplication & shift. This works when the hash is uniformly distributed between 0 and 2^32-1. This speeds up the benchmark by a factor of about 1.3:

RollingBloom, 5, 1500000, 3.73733, 4.97569e-07, 4.99002e-07, 4.98372e-07 # before
RollingBloom, 5, 1500000, 2.86842, 3.81630e-07, 3.83730e-07, 3.82473e-07 # FastMod

Be aware that this changes the internal data of the filter, so this should probably
not be used for CBloomFilter because of interoperability problems.

Replaces the slow modulo operation with a much faster 32bit multiplication & shift. This works
because the hash should be uniformly distributed between 0 and 2^32-1. This speeds up the benchmark
by a factor of about 1.3:

RollingBloom, 5, 1500000, 3.73733, 4.97569e-07, 4.99002e-07, 4.98372e-07 # before
RollingBloom, 5, 1500000, 2.86842, 3.81630e-07, 3.83730e-07, 3.82473e-07 # FastMod

Be aware that this changes the position of the bits that are toggled, so this should probably
not be used for CBloomFilter which is serialized.
@fanquake fanquake requested a review from sipa May 6, 2018 13:08
Copy link
Member

@sipa sipa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 9aac9f9

Using FastMod instead of '%' looks correct here, and it indeed can't be used for normal bloom filters (which are normative for the BIP37 protocol).

@laanwj
Copy link
Member

laanwj commented May 7, 2018

Changed the tag: as the result of FastMod is not the same as % this is not only a refactor.

It looks like a feasible alternative in this case (h should be evenly distributed 0..2^32-1 by MurmurHash), and ~25% speed-up is nice. CRollingBloomFilter is used in the net processing code to keep track of recent rejects. This is a fast path (reject-quickly), so I think this matters.

utACK 9aac9f9

// A replacement for x % n. This assumes that x and n are 32bit integers, and x is a uniformly random distributed 32bit value
// which should be the case for a good hash.
// See https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
static inline uint32_t FastMod(uint32_t x, size_t n) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could replace inline with constexpr

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would that make a difference in the generated code?

Copy link
Member

@sipa sipa May 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's very unlikely - the sort of optimizations that apply to constexpr things also apply to things which the compiler can infer being constant expressions (which is obvious here).

The advantage to using constexpr is making sure those optimizations aren't made impossible for some reason.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed it's unlikely. My inclination is to minimize surface area and maximally restrain the code to reduce the space of errors that might occur. It's not always meaningful, but it can be a helpful precaution and can pay incidental dividends, e.g. by surfacing changes to the use of variables over time that were not intended. But yeah, it's a nit rather than an objection.

@jimpo
Copy link
Contributor

jimpo commented May 8, 2018

utACK 9aac9f9

@gmaxwell
Copy link
Contributor

Perhaps share code and or description with cuckoocache.h compute_hashes?

@martinus
Copy link
Contributor Author

@gmaxwell oh right, compute_hashes does the same. What would the preferred way to remove this duplication, refactor cuckoocache.h and to use a static method fast_mod, include & use that in bloom.cpp?

@laanwj
Copy link
Member

laanwj commented May 18, 2018

Normally sharing common code is a good thing but I'm not sure that is worth it here, this is only a few lines, and would involve sharing code between otherwise unrelated units. Would be different if e.g. bloom.cpp already included cuckoocache.h but now the way to now handle this would be to create a new header...

@laanwj laanwj merged commit 9aac9f9 into bitcoin:master May 18, 2018
laanwj added a commit that referenced this pull request May 18, 2018
…s with FastMod

9aac9f9 replace modulus with FastMod (Martin Ankerl)

Pull request description:

  Not sure if this is optimization is necessary, but anyway I have some spare time so here it is. This replaces the slow modulo operation with a much faster 64bit multiplication & shift. This works when the hash is uniformly distributed between 0 and 2^32-1. This speeds up the benchmark by a factor of about 1.3:

  ```
  RollingBloom, 5, 1500000, 3.73733, 4.97569e-07, 4.99002e-07, 4.98372e-07 # before
  RollingBloom, 5, 1500000, 2.86842, 3.81630e-07, 3.83730e-07, 3.82473e-07 # FastMod
  ```

  Be aware that this changes the internal data of the filter, so this should probably
  not be used for CBloomFilter because of interoperability problems.

Tree-SHA512: 04104f3fb09f56c9d14458a6aad919aeb0a5af944e8ee6a31f00e93c753e22004648c1cd65bf36752b6addec528d19fb665c27b955ce1666a85a928e17afa47a
@martinus martinus deleted the optimize-CRollingBloomFilter branch May 22, 2018 16:26
codablock pushed a commit to codablock/dash that referenced this pull request Apr 11, 2019
… modulus with FastMod

9aac9f9 replace modulus with FastMod (Martin Ankerl)

Pull request description:

  Not sure if this is optimization is necessary, but anyway I have some spare time so here it is. This replaces the slow modulo operation with a much faster 64bit multiplication & shift. This works when the hash is uniformly distributed between 0 and 2^32-1. This speeds up the benchmark by a factor of about 1.3:

  ```
  RollingBloom, 5, 1500000, 3.73733, 4.97569e-07, 4.99002e-07, 4.98372e-07 # before
  RollingBloom, 5, 1500000, 2.86842, 3.81630e-07, 3.83730e-07, 3.82473e-07 # FastMod
  ```

  Be aware that this changes the internal data of the filter, so this should probably
  not be used for CBloomFilter because of interoperability problems.

Tree-SHA512: 04104f3fb09f56c9d14458a6aad919aeb0a5af944e8ee6a31f00e93c753e22004648c1cd65bf36752b6addec528d19fb665c27b955ce1666a85a928e17afa47a
UdjinM6 pushed a commit to dashpay/dash that referenced this pull request Apr 11, 2019
* Merge bitcoin#13176: Improve CRollingBloomFilter performance: replace modulus with FastMod

9aac9f9 replace modulus with FastMod (Martin Ankerl)

Pull request description:

  Not sure if this is optimization is necessary, but anyway I have some spare time so here it is. This replaces the slow modulo operation with a much faster 64bit multiplication & shift. This works when the hash is uniformly distributed between 0 and 2^32-1. This speeds up the benchmark by a factor of about 1.3:

  ```
  RollingBloom, 5, 1500000, 3.73733, 4.97569e-07, 4.99002e-07, 4.98372e-07 # before
  RollingBloom, 5, 1500000, 2.86842, 3.81630e-07, 3.83730e-07, 3.82473e-07 # FastMod
  ```

  Be aware that this changes the internal data of the filter, so this should probably
  not be used for CBloomFilter because of interoperability problems.

Tree-SHA512: 04104f3fb09f56c9d14458a6aad919aeb0a5af944e8ee6a31f00e93c753e22004648c1cd65bf36752b6addec528d19fb665c27b955ce1666a85a928e17afa47a

* Use unordered_map in CSporkManager

In one of my profiling sessions with many InstantSend transactions
happening, calls into CSporkManager added up to about 1% of total CPU time.
This is easily avoidable by using unordered maps.

* Use std::unordered_map instead of std::map in limitedmap

* Use unordered_set for CNode::setAskFor

* Add serialization support for unordered maps and sets

* Use unordered_map for mapArgs and mapMultiArgs

* Let limitedmap prune in batches and use unordered_multimap

Due to the batched pruning, there is no need to maintain an ordered map
of values anymore. Only when nPruneAfterSize, there is a need to create
a temporary ordered vector of values to figure out what can be removed.

* Instead of using a multimap for mapAskFor, use a vector which we sort on demand

CNode::AskFor will now push entries into an initially unordered vector
instead of an ordered multimap. Only when we later want to use vecAskFor in
SendMessages, we sort the vector.

The vector will actually be mostly sorted in most cases as insertion order
usually mimics the desired ordering. Only the last few entries might need
some shuffling around. Doing the sort on-demand should be less wasteful
then trying to maintain correct order all the time.

* Fix compilation of tests

* Fix limitedmap tests

* Rename limitedmap to unordered_limitedmap to ensure backports conflict

This ensures that future backports that depends on limitedmap's ordering
conflict so that we are made aware of needed action.

* Fix compilation error on Travis
jasonbcox pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Oct 11, 2019
…s with FastMod

Summary:
9aac9f9 replace modulus with FastMod (Martin Ankerl)

Pull request description:

  Not sure if this is optimization is necessary, but anyway I have some spare time so here it is. This replaces the slow modulo operation with a much faster 64bit multiplication & shift. This works when the hash is uniformly distributed between 0 and 2^32-1. This speeds up the benchmark by a factor of about 1.3:

  ```
  RollingBloom, 5, 1500000, 3.73733, 4.97569e-07, 4.99002e-07, 4.98372e-07 # before
  RollingBloom, 5, 1500000, 2.86842, 3.81630e-07, 3.83730e-07, 3.82473e-07 # FastMod
  ```

  Be aware that this changes the internal data of the filter, so this should probably
  not be used for CBloomFilter because of interoperability problems.

Tree-SHA512: 04104f3fb09f56c9d14458a6aad919aeb0a5af944e8ee6a31f00e93c753e22004648c1cd65bf36752b6addec528d19fb665c27b955ce1666a85a928e17afa47a

Backport of Core PR13176
bitcoin/bitcoin#13176

Test Plan:
  make check
  src/bench/bench_bitcoin -filter=RollingBloom

Repeat above for master and compare.
Before change:
  Benchmark, evals, iterations, total, min, max, median
  RollingBloom, 5, 1500000, 5.16318, 6.61227e-07, 7.3991e-07, 6.70607e-07

After change:
  Benchmark, evals, iterations, total, min, max, median
  RollingBloom, 5, 1500000, 3.73982, 4.92548e-07, 5.10237e-07, 4.95271e-07

Reviewers: deadalnix, Fabien, jasonbcox, O1 Bitcoin ABC, #bitcoin_abc

Reviewed By: deadalnix, O1 Bitcoin ABC, #bitcoin_abc

Differential Revision: https://reviews.bitcoinabc.org/D4160
jonspock pushed a commit to jonspock/devault that referenced this pull request Dec 27, 2019
…s with FastMod

Summary:
9aac9f90d5e56752cc6cbfac48063ad29a01143c replace modulus with FastMod (Martin Ankerl)

Pull request description:

  Not sure if this is optimization is necessary, but anyway I have some spare time so here it is. This replaces the slow modulo operation with a much faster 64bit multiplication & shift. This works when the hash is uniformly distributed between 0 and 2^32-1. This speeds up the benchmark by a factor of about 1.3:

  ```
  RollingBloom, 5, 1500000, 3.73733, 4.97569e-07, 4.99002e-07, 4.98372e-07 # before
  RollingBloom, 5, 1500000, 2.86842, 3.81630e-07, 3.83730e-07, 3.82473e-07 # FastMod
  ```

  Be aware that this changes the internal data of the filter, so this should probably
  not be used for CBloomFilter because of interoperability problems.

Tree-SHA512: 04104f3fb09f56c9d14458a6aad919aeb0a5af944e8ee6a31f00e93c753e22004648c1cd65bf36752b6addec528d19fb665c27b955ce1666a85a928e17afa47a

Backport of Core PR13176
bitcoin/bitcoin#13176

Test Plan:
  make check
  src/bench/bench_bitcoin -filter=RollingBloom

Repeat above for master and compare.
Before change:
  Benchmark, evals, iterations, total, min, max, median
  RollingBloom, 5, 1500000, 5.16318, 6.61227e-07, 7.3991e-07, 6.70607e-07

After change:
  Benchmark, evals, iterations, total, min, max, median
  RollingBloom, 5, 1500000, 3.73982, 4.92548e-07, 5.10237e-07, 4.95271e-07

Reviewers: deadalnix, Fabien, jasonbcox, O1 Bitcoin ABC, #bitcoin_abc

Reviewed By: deadalnix, O1 Bitcoin ABC, #bitcoin_abc

Differential Revision: https://reviews.bitcoinabc.org/D4160
jonspock pushed a commit to devaultcrypto/devault that referenced this pull request Dec 29, 2019
…s with FastMod

Summary:
9aac9f90d5e56752cc6cbfac48063ad29a01143c replace modulus with FastMod (Martin Ankerl)

Pull request description:

  Not sure if this is optimization is necessary, but anyway I have some spare time so here it is. This replaces the slow modulo operation with a much faster 64bit multiplication & shift. This works when the hash is uniformly distributed between 0 and 2^32-1. This speeds up the benchmark by a factor of about 1.3:

  ```
  RollingBloom, 5, 1500000, 3.73733, 4.97569e-07, 4.99002e-07, 4.98372e-07 # before
  RollingBloom, 5, 1500000, 2.86842, 3.81630e-07, 3.83730e-07, 3.82473e-07 # FastMod
  ```

  Be aware that this changes the internal data of the filter, so this should probably
  not be used for CBloomFilter because of interoperability problems.

Tree-SHA512: 04104f3fb09f56c9d14458a6aad919aeb0a5af944e8ee6a31f00e93c753e22004648c1cd65bf36752b6addec528d19fb665c27b955ce1666a85a928e17afa47a

Backport of Core PR13176
bitcoin/bitcoin#13176

Test Plan:
  make check
  src/bench/bench_bitcoin -filter=RollingBloom

Repeat above for master and compare.
Before change:
  Benchmark, evals, iterations, total, min, max, median
  RollingBloom, 5, 1500000, 5.16318, 6.61227e-07, 7.3991e-07, 6.70607e-07

After change:
  Benchmark, evals, iterations, total, min, max, median
  RollingBloom, 5, 1500000, 3.73982, 4.92548e-07, 5.10237e-07, 4.95271e-07

Reviewers: deadalnix, Fabien, jasonbcox, O1 Bitcoin ABC, #bitcoin_abc

Reviewed By: deadalnix, O1 Bitcoin ABC, #bitcoin_abc

Differential Revision: https://reviews.bitcoinabc.org/D4160
zkbot added a commit to zcash/zcash that referenced this pull request Mar 5, 2021
Backport bloom filter improvements

Cherry-picked from the following upstream PRs:

- bitcoin/bitcoin#7113
- bitcoin/bitcoin#7818
  - Only the second commit (to resolve conflicts).
- bitcoin/bitcoin#7934
- bitcoin/bitcoin#8655
  - Partial backport to help resolve conflicts.
- bitcoin/bitcoin#9060
- bitcoin/bitcoin#9223
- bitcoin/bitcoin#9644
  - Partial backport to help resolve conflicts.
- bitcoin/bitcoin#9916
- bitcoin/bitcoin#9750
- bitcoin/bitcoin#13176
- bitcoin/bitcoin#13948
- bitcoin/bitcoin#16073
- bitcoin/bitcoin#18670
- bitcoin/bitcoin#18806
  - Reveals upstream's covert fix for CVE-2013-5700.
- bitcoin/bitcoin#19968
zkbot added a commit to zcash/zcash that referenced this pull request Apr 15, 2021
Backport bloom filter improvements

Cherry-picked from the following upstream PRs:

- bitcoin/bitcoin#7113
- bitcoin/bitcoin#7818
  - Only the second commit (to resolve conflicts).
- bitcoin/bitcoin#7934
- bitcoin/bitcoin#8655
  - Partial backport to help resolve conflicts.
- bitcoin/bitcoin#9060
- bitcoin/bitcoin#9223
- bitcoin/bitcoin#9644
  - Partial backport to help resolve conflicts.
- bitcoin/bitcoin#9916
- bitcoin/bitcoin#9750
- bitcoin/bitcoin#13176
- bitcoin/bitcoin#13948
- bitcoin/bitcoin#16073
- bitcoin/bitcoin#18670
- bitcoin/bitcoin#18806
  - Reveals upstream's covert fix for CVE-2013-5700.
- bitcoin/bitcoin#19968
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants