Improve CRollingBloomFilter performance: replace modulus with FastMod #13176

martinus · 2018-05-06T12:15:49Z

Not sure if this is optimization is necessary, but anyway I have some spare time so here it is. This replaces the slow modulo operation with a much faster 64bit multiplication & shift. This works when the hash is uniformly distributed between 0 and 2^32-1. This speeds up the benchmark by a factor of about 1.3:

RollingBloom, 5, 1500000, 3.73733, 4.97569e-07, 4.99002e-07, 4.98372e-07 # before
RollingBloom, 5, 1500000, 2.86842, 3.81630e-07, 3.83730e-07, 3.82473e-07 # FastMod

Be aware that this changes the internal data of the filter, so this should probably
not be used for CBloomFilter because of interoperability problems.

Replaces the slow modulo operation with a much faster 32bit multiplication & shift. This works because the hash should be uniformly distributed between 0 and 2^32-1. This speeds up the benchmark by a factor of about 1.3: RollingBloom, 5, 1500000, 3.73733, 4.97569e-07, 4.99002e-07, 4.98372e-07 # before RollingBloom, 5, 1500000, 2.86842, 3.81630e-07, 3.83730e-07, 3.82473e-07 # FastMod Be aware that this changes the position of the bits that are toggled, so this should probably not be used for CBloomFilter which is serialized.

sipa

utACK 9aac9f9

Using FastMod instead of '%' looks correct here, and it indeed can't be used for normal bloom filters (which are normative for the BIP37 protocol).

laanwj · 2018-05-07T12:02:59Z

Changed the tag: as the result of FastMod is not the same as % this is not only a refactor.

It looks like a feasible alternative in this case (h should be evenly distributed 0..2^32-1 by MurmurHash), and ~25% speed-up is nice. CRollingBloomFilter is used in the net processing code to keep track of recent rejects. This is a fast path (reject-quickly), so I think this matters.

utACK 9aac9f9

Empact · 2018-05-07T17:18:07Z

src/bloom.cpp

+// A replacement for x % n. This assumes that x and n are 32bit integers, and x is a uniformly random distributed 32bit value
+// which should be the case for a good hash.
+// See https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
+static inline uint32_t FastMod(uint32_t x, size_t n) {


nit: Could replace inline with constexpr

Would that make a difference in the generated code?

That's very unlikely - the sort of optimizations that apply to constexpr things also apply to things which the compiler can infer being constant expressions (which is obvious here).

The advantage to using constexpr is making sure those optimizations aren't made impossible for some reason.

Agreed it's unlikely. My inclination is to minimize surface area and maximally restrain the code to reduce the space of errors that might occur. It's not always meaningful, but it can be a helpful precaution and can pay incidental dividends, e.g. by surfacing changes to the use of variables over time that were not intended. But yeah, it's a nit rather than an objection.

jimpo · 2018-05-08T07:01:15Z

utACK 9aac9f9

gmaxwell · 2018-05-15T00:18:33Z

Perhaps share code and or description with cuckoocache.h compute_hashes?

martinus · 2018-05-15T06:18:53Z

@gmaxwell oh right, compute_hashes does the same. What would the preferred way to remove this duplication, refactor cuckoocache.h and to use a static method fast_mod, include & use that in bloom.cpp?

laanwj · 2018-05-18T16:36:33Z

Normally sharing common code is a good thing but I'm not sure that is worth it here, this is only a few lines, and would involve sharing code between otherwise unrelated units. Would be different if e.g. bloom.cpp already included cuckoocache.h but now the way to now handle this would be to create a new header...

…s with FastMod 9aac9f9 replace modulus with FastMod (Martin Ankerl) Pull request description: Not sure if this is optimization is necessary, but anyway I have some spare time so here it is. This replaces the slow modulo operation with a much faster 64bit multiplication & shift. This works when the hash is uniformly distributed between 0 and 2^32-1. This speeds up the benchmark by a factor of about 1.3: ``` RollingBloom, 5, 1500000, 3.73733, 4.97569e-07, 4.99002e-07, 4.98372e-07 # before RollingBloom, 5, 1500000, 2.86842, 3.81630e-07, 3.83730e-07, 3.82473e-07 # FastMod ``` Be aware that this changes the internal data of the filter, so this should probably not be used for CBloomFilter because of interoperability problems. Tree-SHA512: 04104f3fb09f56c9d14458a6aad919aeb0a5af944e8ee6a31f00e93c753e22004648c1cd65bf36752b6addec528d19fb665c27b955ce1666a85a928e17afa47a

… modulus with FastMod 9aac9f9 replace modulus with FastMod (Martin Ankerl) Pull request description: Not sure if this is optimization is necessary, but anyway I have some spare time so here it is. This replaces the slow modulo operation with a much faster 64bit multiplication & shift. This works when the hash is uniformly distributed between 0 and 2^32-1. This speeds up the benchmark by a factor of about 1.3: ``` RollingBloom, 5, 1500000, 3.73733, 4.97569e-07, 4.99002e-07, 4.98372e-07 # before RollingBloom, 5, 1500000, 2.86842, 3.81630e-07, 3.83730e-07, 3.82473e-07 # FastMod ``` Be aware that this changes the internal data of the filter, so this should probably not be used for CBloomFilter because of interoperability problems. Tree-SHA512: 04104f3fb09f56c9d14458a6aad919aeb0a5af944e8ee6a31f00e93c753e22004648c1cd65bf36752b6addec528d19fb665c27b955ce1666a85a928e17afa47a

* Merge bitcoin#13176: Improve CRollingBloomFilter performance: replace modulus with FastMod 9aac9f9 replace modulus with FastMod (Martin Ankerl) Pull request description: Not sure if this is optimization is necessary, but anyway I have some spare time so here it is. This replaces the slow modulo operation with a much faster 64bit multiplication & shift. This works when the hash is uniformly distributed between 0 and 2^32-1. This speeds up the benchmark by a factor of about 1.3: ``` RollingBloom, 5, 1500000, 3.73733, 4.97569e-07, 4.99002e-07, 4.98372e-07 # before RollingBloom, 5, 1500000, 2.86842, 3.81630e-07, 3.83730e-07, 3.82473e-07 # FastMod ``` Be aware that this changes the internal data of the filter, so this should probably not be used for CBloomFilter because of interoperability problems. Tree-SHA512: 04104f3fb09f56c9d14458a6aad919aeb0a5af944e8ee6a31f00e93c753e22004648c1cd65bf36752b6addec528d19fb665c27b955ce1666a85a928e17afa47a * Use unordered_map in CSporkManager In one of my profiling sessions with many InstantSend transactions happening, calls into CSporkManager added up to about 1% of total CPU time. This is easily avoidable by using unordered maps. * Use std::unordered_map instead of std::map in limitedmap * Use unordered_set for CNode::setAskFor * Add serialization support for unordered maps and sets * Use unordered_map for mapArgs and mapMultiArgs * Let limitedmap prune in batches and use unordered_multimap Due to the batched pruning, there is no need to maintain an ordered map of values anymore. Only when nPruneAfterSize, there is a need to create a temporary ordered vector of values to figure out what can be removed. * Instead of using a multimap for mapAskFor, use a vector which we sort on demand CNode::AskFor will now push entries into an initially unordered vector instead of an ordered multimap. Only when we later want to use vecAskFor in SendMessages, we sort the vector. The vector will actually be mostly sorted in most cases as insertion order usually mimics the desired ordering. Only the last few entries might need some shuffling around. Doing the sort on-demand should be less wasteful then trying to maintain correct order all the time. * Fix compilation of tests * Fix limitedmap tests * Rename limitedmap to unordered_limitedmap to ensure backports conflict This ensures that future backports that depends on limitedmap's ordering conflict so that we are made aware of needed action. * Fix compilation error on Travis

…s with FastMod Summary: 9aac9f9 replace modulus with FastMod (Martin Ankerl) Pull request description: Not sure if this is optimization is necessary, but anyway I have some spare time so here it is. This replaces the slow modulo operation with a much faster 64bit multiplication & shift. This works when the hash is uniformly distributed between 0 and 2^32-1. This speeds up the benchmark by a factor of about 1.3: ``` RollingBloom, 5, 1500000, 3.73733, 4.97569e-07, 4.99002e-07, 4.98372e-07 # before RollingBloom, 5, 1500000, 2.86842, 3.81630e-07, 3.83730e-07, 3.82473e-07 # FastMod ``` Be aware that this changes the internal data of the filter, so this should probably not be used for CBloomFilter because of interoperability problems. Tree-SHA512: 04104f3fb09f56c9d14458a6aad919aeb0a5af944e8ee6a31f00e93c753e22004648c1cd65bf36752b6addec528d19fb665c27b955ce1666a85a928e17afa47a Backport of Core PR13176 bitcoin/bitcoin#13176 Test Plan: make check src/bench/bench_bitcoin -filter=RollingBloom Repeat above for master and compare. Before change: Benchmark, evals, iterations, total, min, max, median RollingBloom, 5, 1500000, 5.16318, 6.61227e-07, 7.3991e-07, 6.70607e-07 After change: Benchmark, evals, iterations, total, min, max, median RollingBloom, 5, 1500000, 3.73982, 4.92548e-07, 5.10237e-07, 4.95271e-07 Reviewers: deadalnix, Fabien, jasonbcox, O1 Bitcoin ABC, #bitcoin_abc Reviewed By: deadalnix, O1 Bitcoin ABC, #bitcoin_abc Differential Revision: https://reviews.bitcoinabc.org/D4160

…s with FastMod Summary: 9aac9f90d5e56752cc6cbfac48063ad29a01143c replace modulus with FastMod (Martin Ankerl) Pull request description: Not sure if this is optimization is necessary, but anyway I have some spare time so here it is. This replaces the slow modulo operation with a much faster 64bit multiplication & shift. This works when the hash is uniformly distributed between 0 and 2^32-1. This speeds up the benchmark by a factor of about 1.3: ``` RollingBloom, 5, 1500000, 3.73733, 4.97569e-07, 4.99002e-07, 4.98372e-07 # before RollingBloom, 5, 1500000, 2.86842, 3.81630e-07, 3.83730e-07, 3.82473e-07 # FastMod ``` Be aware that this changes the internal data of the filter, so this should probably not be used for CBloomFilter because of interoperability problems. Tree-SHA512: 04104f3fb09f56c9d14458a6aad919aeb0a5af944e8ee6a31f00e93c753e22004648c1cd65bf36752b6addec528d19fb665c27b955ce1666a85a928e17afa47a Backport of Core PR13176 bitcoin/bitcoin#13176 Test Plan: make check src/bench/bench_bitcoin -filter=RollingBloom Repeat above for master and compare. Before change: Benchmark, evals, iterations, total, min, max, median RollingBloom, 5, 1500000, 5.16318, 6.61227e-07, 7.3991e-07, 6.70607e-07 After change: Benchmark, evals, iterations, total, min, max, median RollingBloom, 5, 1500000, 3.73982, 4.92548e-07, 5.10237e-07, 4.95271e-07 Reviewers: deadalnix, Fabien, jasonbcox, O1 Bitcoin ABC, #bitcoin_abc Reviewed By: deadalnix, O1 Bitcoin ABC, #bitcoin_abc Differential Revision: https://reviews.bitcoinabc.org/D4160

Backport bloom filter improvements Cherry-picked from the following upstream PRs: - bitcoin/bitcoin#7113 - bitcoin/bitcoin#7818 - Only the second commit (to resolve conflicts). - bitcoin/bitcoin#7934 - bitcoin/bitcoin#8655 - Partial backport to help resolve conflicts. - bitcoin/bitcoin#9060 - bitcoin/bitcoin#9223 - bitcoin/bitcoin#9644 - Partial backport to help resolve conflicts. - bitcoin/bitcoin#9916 - bitcoin/bitcoin#9750 - bitcoin/bitcoin#13176 - bitcoin/bitcoin#13948 - bitcoin/bitcoin#16073 - bitcoin/bitcoin#18670 - bitcoin/bitcoin#18806 - Reveals upstream's covert fix for CVE-2013-5700. - bitcoin/bitcoin#19968

fanquake added the Refactoring label May 6, 2018

fanquake requested a review from sipa May 6, 2018 13:08

sipa reviewed May 6, 2018

View reviewed changes

laanwj added Utils/log/libs and removed Refactoring labels May 7, 2018

Empact reviewed May 7, 2018

View reviewed changes

laanwj merged commit 9aac9f9 into bitcoin:master May 18, 2018

Bushstar mentioned this pull request May 21, 2018

commits from bitcoin/master FeatherCoin/Feathercoin#334

Merged

martinus deleted the optimize-CRollingBloomFilter branch May 22, 2018 16:26

codablock mentioned this pull request Apr 11, 2019

Collection of minor performance optimizations dashpay/dash#2855

Merged

str4d mentioned this pull request Mar 5, 2021

Backport bloom filter improvements zcash/zcash#5026

Merged

bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve CRollingBloomFilter performance: replace modulus with FastMod #13176

Improve CRollingBloomFilter performance: replace modulus with FastMod #13176

Uh oh!

martinus commented May 6, 2018 •

edited

Loading

Uh oh!

sipa left a comment

Uh oh!

laanwj commented May 7, 2018 •

edited

Loading

Uh oh!

Empact May 7, 2018

Uh oh!

laanwj May 10, 2018

Uh oh!

sipa May 11, 2018 •

edited

Loading

Uh oh!

Empact May 13, 2018

Uh oh!

jimpo commented May 8, 2018

Uh oh!

gmaxwell commented May 15, 2018

Uh oh!

martinus commented May 15, 2018

Uh oh!

laanwj commented May 18, 2018

Uh oh!

Uh oh!

Improve CRollingBloomFilter performance: replace modulus with FastMod #13176

Improve CRollingBloomFilter performance: replace modulus with FastMod #13176

Uh oh!

Conversation

martinus commented May 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sipa left a comment

Choose a reason for hiding this comment

Uh oh!

laanwj commented May 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Empact May 7, 2018

Choose a reason for hiding this comment

Uh oh!

laanwj May 10, 2018

Choose a reason for hiding this comment

Uh oh!

sipa May 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Empact May 13, 2018

Choose a reason for hiding this comment

Uh oh!

jimpo commented May 8, 2018

Uh oh!

gmaxwell commented May 15, 2018

Uh oh!

martinus commented May 15, 2018

Uh oh!

laanwj commented May 18, 2018

Uh oh!

Uh oh!

martinus commented May 6, 2018 •

edited

Loading

laanwj commented May 7, 2018 •

edited

Loading

sipa May 11, 2018 •

edited

Loading