-
Notifications
You must be signed in to change notification settings - Fork 37.7k
doc: clarify CRollingBloomFilter size estimate #19968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c0ea51e
to
01d14f2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 01d14f2
I tested this by being new to bitcoin and having only a high-level understanding of bloom filters. The commit made the trade-off between the acceptable false positive rate and memory footprint easy to understand. A couple optional nits below.
src/bloom.h
Outdated
nfilbits = math.ceil(-1.0*nhash*(3*int((nel+1)/2)) / math.log(1.0 - math.exp(logfp / nhash))) | ||
return (nfilbits+63)/64*2*8 | ||
|
||
* If we make these assumptions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I found these next 6 lines added complexity to something that was very clear thus far. Do you think the simpler formula makes it easier to reason about the size of the bloom filter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this render on doxygen? If it is markdown style you'd likely want to quote an inline script using triple-backquotes.
TBH I'm not sure the python script adds much in the first place. Sometimes a simple but good-enough approximation is better for documentation purposes. Is this level of precision necessary?
(edit: whoops, didn't mean to click the resolve conversation button)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment: #19130 (comment)
Given how accurate the formula is in all reasonable cases (at most 0.115% off, and usually much less), I don't think it's worth including the code.
The rest of the PR seems useful, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the input! Agreed that the extra complexity from the Python script is not worthwhile for the small accuracy gain; I'll update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/bloom.h
Outdated
@@ -94,7 +94,26 @@ class CBloomFilter | |||
* insert()'ed ... but may also return true for items that were not inserted. | |||
* | |||
* It needs around 1.8 bytes per element per factor 0.1 of false positive rate. | |||
* (More accurately: 3/(log(256)*log(2)) * log(1/fpRate) * nElements bytes) | |||
* e.g. for 1000 elements, we'd need: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Maybe add "Lower tolerance for false positives increases the size of the bloom filter manifesting a trade-off between memory and compute (in the case of a false positive)."
01d14f2
to
d9141a0
Compare
ACK d9141a0 |
d9141a0 doc: clarify CRollingBloomFilter size estimate (Anthony Towns) Pull request description: Based on bitcoin#19130, this change improves the comment for `CRollingBloomFilter` in `bloom.h`: - Give examples to illustrate the heuristic "1.8 bytes per element per factor 0.1 of false positive rate" - Add some Python code which can be copy/pasted for convenient filter size calculation (in an interpreter) - Reconcile the newly added code with the existing approximation ACKs for top commit: laanwj: ACK d9141a0 Tree-SHA512: e7138b3c531883a750ead06368975c750863fde7ef6f2633b137eca011079226e9205316217322014399fba05a48f294c788dd700bb7d479c58fe1f23e40419f
Backport bloom filter improvements Cherry-picked from the following upstream PRs: - bitcoin/bitcoin#7113 - bitcoin/bitcoin#7818 - Only the second commit (to resolve conflicts). - bitcoin/bitcoin#7934 - bitcoin/bitcoin#8655 - Partial backport to help resolve conflicts. - bitcoin/bitcoin#9060 - bitcoin/bitcoin#9223 - bitcoin/bitcoin#9644 - Partial backport to help resolve conflicts. - bitcoin/bitcoin#9916 - bitcoin/bitcoin#9750 - bitcoin/bitcoin#13176 - bitcoin/bitcoin#13948 - bitcoin/bitcoin#16073 - bitcoin/bitcoin#18670 - bitcoin/bitcoin#18806 - Reveals upstream's covert fix for CVE-2013-5700. - bitcoin/bitcoin#19968
Backport bloom filter improvements Cherry-picked from the following upstream PRs: - bitcoin/bitcoin#7113 - bitcoin/bitcoin#7818 - Only the second commit (to resolve conflicts). - bitcoin/bitcoin#7934 - bitcoin/bitcoin#8655 - Partial backport to help resolve conflicts. - bitcoin/bitcoin#9060 - bitcoin/bitcoin#9223 - bitcoin/bitcoin#9644 - Partial backport to help resolve conflicts. - bitcoin/bitcoin#9916 - bitcoin/bitcoin#9750 - bitcoin/bitcoin#13176 - bitcoin/bitcoin#13948 - bitcoin/bitcoin#16073 - bitcoin/bitcoin#18670 - bitcoin/bitcoin#18806 - Reveals upstream's covert fix for CVE-2013-5700. - bitcoin/bitcoin#19968
d9141a0 doc: clarify CRollingBloomFilter size estimate (Anthony Towns) Pull request description: Based on bitcoin#19130, this change improves the comment for `CRollingBloomFilter` in `bloom.h`: - Give examples to illustrate the heuristic "1.8 bytes per element per factor 0.1 of false positive rate" - Add some Python code which can be copy/pasted for convenient filter size calculation (in an interpreter) - Reconcile the newly added code with the existing approximation ACKs for top commit: laanwj: ACK d9141a0 Tree-SHA512: e7138b3c531883a750ead06368975c750863fde7ef6f2633b137eca011079226e9205316217322014399fba05a48f294c788dd700bb7d479c58fe1f23e40419f
d9141a0 doc: clarify CRollingBloomFilter size estimate (Anthony Towns) Pull request description: Based on bitcoin#19130, this change improves the comment for `CRollingBloomFilter` in `bloom.h`: - Give examples to illustrate the heuristic "1.8 bytes per element per factor 0.1 of false positive rate" - Add some Python code which can be copy/pasted for convenient filter size calculation (in an interpreter) - Reconcile the newly added code with the existing approximation ACKs for top commit: laanwj: ACK d9141a0 Tree-SHA512: e7138b3c531883a750ead06368975c750863fde7ef6f2633b137eca011079226e9205316217322014399fba05a48f294c788dd700bb7d479c58fe1f23e40419f
d9141a0 doc: clarify CRollingBloomFilter size estimate (Anthony Towns) Pull request description: Based on bitcoin#19130, this change improves the comment for `CRollingBloomFilter` in `bloom.h`: - Give examples to illustrate the heuristic "1.8 bytes per element per factor 0.1 of false positive rate" - Add some Python code which can be copy/pasted for convenient filter size calculation (in an interpreter) - Reconcile the newly added code with the existing approximation ACKs for top commit: laanwj: ACK d9141a0 Tree-SHA512: e7138b3c531883a750ead06368975c750863fde7ef6f2633b137eca011079226e9205316217322014399fba05a48f294c788dd700bb7d479c58fe1f23e40419f
d9141a0 doc: clarify CRollingBloomFilter size estimate (Anthony Towns) Pull request description: Based on bitcoin#19130, this change improves the comment for `CRollingBloomFilter` in `bloom.h`: - Give examples to illustrate the heuristic "1.8 bytes per element per factor 0.1 of false positive rate" - Add some Python code which can be copy/pasted for convenient filter size calculation (in an interpreter) - Reconcile the newly added code with the existing approximation ACKs for top commit: laanwj: ACK d9141a0 Tree-SHA512: e7138b3c531883a750ead06368975c750863fde7ef6f2633b137eca011079226e9205316217322014399fba05a48f294c788dd700bb7d479c58fe1f23e40419f
d9141a0 doc: clarify CRollingBloomFilter size estimate (Anthony Towns) Pull request description: Based on bitcoin#19130, this change improves the comment for `CRollingBloomFilter` in `bloom.h`: - Give examples to illustrate the heuristic "1.8 bytes per element per factor 0.1 of false positive rate" - Add some Python code which can be copy/pasted for convenient filter size calculation (in an interpreter) - Reconcile the newly added code with the existing approximation ACKs for top commit: laanwj: ACK d9141a0 Tree-SHA512: e7138b3c531883a750ead06368975c750863fde7ef6f2633b137eca011079226e9205316217322014399fba05a48f294c788dd700bb7d479c58fe1f23e40419f
d9141a0 doc: clarify CRollingBloomFilter size estimate (Anthony Towns) Pull request description: Based on bitcoin#19130, this change improves the comment for `CRollingBloomFilter` in `bloom.h`: - Give examples to illustrate the heuristic "1.8 bytes per element per factor 0.1 of false positive rate" - Add some Python code which can be copy/pasted for convenient filter size calculation (in an interpreter) - Reconcile the newly added code with the existing approximation ACKs for top commit: laanwj: ACK d9141a0 Tree-SHA512: e7138b3c531883a750ead06368975c750863fde7ef6f2633b137eca011079226e9205316217322014399fba05a48f294c788dd700bb7d479c58fe1f23e40419f
d9141a0 doc: clarify CRollingBloomFilter size estimate (Anthony Towns) Pull request description: Based on bitcoin#19130, this change improves the comment for `CRollingBloomFilter` in `bloom.h`: - Give examples to illustrate the heuristic "1.8 bytes per element per factor 0.1 of false positive rate" - Add some Python code which can be copy/pasted for convenient filter size calculation (in an interpreter) - Reconcile the newly added code with the existing approximation ACKs for top commit: laanwj: ACK d9141a0 Tree-SHA512: e7138b3c531883a750ead06368975c750863fde7ef6f2633b137eca011079226e9205316217322014399fba05a48f294c788dd700bb7d479c58fe1f23e40419f
d9141a0 doc: clarify CRollingBloomFilter size estimate (Anthony Towns) Pull request description: Based on bitcoin#19130, this change improves the comment for `CRollingBloomFilter` in `bloom.h`: - Give examples to illustrate the heuristic "1.8 bytes per element per factor 0.1 of false positive rate" - Add some Python code which can be copy/pasted for convenient filter size calculation (in an interpreter) - Reconcile the newly added code with the existing approximation ACKs for top commit: laanwj: ACK d9141a0 Tree-SHA512: e7138b3c531883a750ead06368975c750863fde7ef6f2633b137eca011079226e9205316217322014399fba05a48f294c788dd700bb7d479c58fe1f23e40419f
d9141a0 doc: clarify CRollingBloomFilter size estimate (Anthony Towns) Pull request description: Based on bitcoin#19130, this change improves the comment for `CRollingBloomFilter` in `bloom.h`: - Give examples to illustrate the heuristic "1.8 bytes per element per factor 0.1 of false positive rate" - Add some Python code which can be copy/pasted for convenient filter size calculation (in an interpreter) - Reconcile the newly added code with the existing approximation ACKs for top commit: laanwj: ACK d9141a0 Tree-SHA512: e7138b3c531883a750ead06368975c750863fde7ef6f2633b137eca011079226e9205316217322014399fba05a48f294c788dd700bb7d479c58fe1f23e40419f
Based on #19130, this change improves the comment for
CRollingBloomFilter
inbloom.h
: