-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Improve rolling bloom filter performance and benchmark #7934
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
84d71ac
to
67d44f8
Compare
uint32_t h = RollingBloomHash(n, nTweak, vKey); | ||
int bit = h & 0x3F; | ||
uint32_t pos = (h >> 6) % data.size(); | ||
/* The lowest bit of pos is ignored, and set to zero first the first bit, and to one for the second. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set to zero for
firstthe first bit
light utACK 67d44f8, didn't verify masking operations |
This patch changes the implementation from one that stores 16 2-bit integers in one uint32_t's, to one that stores the first bit of 64 2-bit integers in one uint64_t and the second bit in another. This allows for 450x faster refreshing and 2.2x faster average speed.
Addressed @dcousens's nits. |
utACK. |
utACK 1953c40 |
ACK. (appears to work.) |
utACK 1953c40 |
I'd like to understand this code - where do I start? |
uint32_t h = Hash(n, vKey); | ||
put(h, nGeneration); | ||
uint32_t h = RollingBloomHash(n, nTweak, vKey); | ||
int bit = h & 0x3F; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it is confusing alternating between 0x3f
and 63
a few times in this code...
In this case it is the theory that is important to understand. With that, the code is pretty straightforward. Google "bloom filters". Most notably the wikipedia page about Bloom filters has a lot of references to CS literature about various kinds of bloom filters, and the article itself may give basic understanding. |
Micro-benchmarking framework part 1 Cherry-picked from the following upstream PRs: - bitcoin/bitcoin#6733 - bitcoin/bitcoin#6770 - bitcoin/bitcoin#6892 - Excluding changes to `src/policy/policy.h` which we don't have yet. - bitcoin/bitcoin#7934 - Just the benchmark, not the performance improvements. - bitcoin/bitcoin#8039 - bitcoin/bitcoin#8107 - bitcoin/bitcoin#8115 - bitcoin/bitcoin#8914 - Required resolving several merge conflicts in code that had been refactored upstream. The changes were simple enough that I decided it was okay to impose merge conflicts on pulling in those refactors later. - bitcoin/bitcoin#9200 - bitcoin/bitcoin#9202 - Adds support for measuring CPU cycles, which is later removed in an upstream PR after the refactor. I am including it to reduce future merge conflicts. - bitcoin/bitcoin#9281 - Only changes to `src/bench/bench.cpp` - bitcoin/bitcoin#9498 - bitcoin/bitcoin#9712 - bitcoin/bitcoin#9547 - bitcoin/bitcoin#9505 - Just the benchmark, not the performance improvements. - bitcoin/bitcoin#9792 - Just the benchmark, not the performance improvements. - bitcoin/bitcoin#10272 - bitcoin/bitcoin#10395 - Only changes to `src/bench/` - bitcoin/bitcoin#10735 - Only changes to `src/bench/base58.cpp` - bitcoin/bitcoin#10963 - bitcoin/bitcoin#11303 - Only the benchmark backend change. - bitcoin/bitcoin#11562 - bitcoin/bitcoin#11646 - bitcoin/bitcoin#11654 This pulls in all changes to the micro-benchmark framework prior to December 2017, when it was rewritten. The rewrite depends on other upstream PRs we have not pulled in yet. This does not pull in all benchmarks prior to December 2017. It leaves out benchmarks that either test code we do not have yet (except for the `FastRandomContext` refactor, which I decided to pull in), or would require rewrites to work with our changes to the codebase.
Backport bloom filter improvements Cherry-picked from the following upstream PRs: - bitcoin/bitcoin#7113 - bitcoin/bitcoin#7818 - Only the second commit (to resolve conflicts). - bitcoin/bitcoin#7934 - bitcoin/bitcoin#8655 - Partial backport to help resolve conflicts. - bitcoin/bitcoin#9060 - bitcoin/bitcoin#9223 - bitcoin/bitcoin#9644 - Partial backport to help resolve conflicts. - bitcoin/bitcoin#9916 - bitcoin/bitcoin#9750 - bitcoin/bitcoin#13176 - bitcoin/bitcoin#13948 - bitcoin/bitcoin#16073 - bitcoin/bitcoin#18670 - bitcoin/bitcoin#18806 - Reveals upstream's covert fix for CVE-2013-5700. - bitcoin/bitcoin#19968
Backport bloom filter improvements Cherry-picked from the following upstream PRs: - bitcoin/bitcoin#7113 - bitcoin/bitcoin#7818 - Only the second commit (to resolve conflicts). - bitcoin/bitcoin#7934 - bitcoin/bitcoin#8655 - Partial backport to help resolve conflicts. - bitcoin/bitcoin#9060 - bitcoin/bitcoin#9223 - bitcoin/bitcoin#9644 - Partial backport to help resolve conflicts. - bitcoin/bitcoin#9916 - bitcoin/bitcoin#9750 - bitcoin/bitcoin#13176 - bitcoin/bitcoin#13948 - bitcoin/bitcoin#16073 - bitcoin/bitcoin#18670 - bitcoin/bitcoin#18806 - Reveals upstream's covert fix for CVE-2013-5700. - bitcoin/bitcoin#19968
Added a benchmark for the rolling bloom filter (with parameters corresponding to the tx rejection cache), which showed that on average, adding+checking one item takes around 2.1us, but the refresh action that wipes all old generation items every 60000 iterations takes 65ms, which is very significant.
Thus, this patch also changes the implementation from one that stores 16 2-bit integers in uint32_t's, to one that stores the first bit of 64 2-bit integers in one uint64_t and the second bit in another. This allows for 450x faster refreshing (0.14ms) and 2.2x faster average adding+checking (0.93us).
All benchmarks done on an Intel Core i7-4800MQ CPU, running at 2.6 GHz, with binaries compiled with GCC 5.3.1.