Skip to content

Conversation

robot-dreams
Copy link
Contributor

Based on #19130, this change improves the comment for CRollingBloomFilter in bloom.h:

  • Give examples to illustrate the heuristic "1.8 bytes per element per factor 0.1 of false positive rate"
  • Add some Python code which can be copy/pasted for convenient filter size calculation (in an interpreter)
  • Reconcile the newly added code with the existing approximation

@fanquake fanquake added the Docs label Sep 17, 2020
@robot-dreams robot-dreams force-pushed the bloom-doc branch 2 times, most recently from c0ea51e to 01d14f2 Compare September 18, 2020 04:04
Copy link
Contributor

@dhruv dhruv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 01d14f2

I tested this by being new to bitcoin and having only a high-level understanding of bloom filters. The commit made the trade-off between the acceptable false positive rate and memory footprint easy to understand. A couple optional nits below.

src/bloom.h Outdated
nfilbits = math.ceil(-1.0*nhash*(3*int((nel+1)/2)) / math.log(1.0 - math.exp(logfp / nhash)))
return (nfilbits+63)/64*2*8

* If we make these assumptions:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I found these next 6 lines added complexity to something that was very clear thus far. Do you think the simpler formula makes it easier to reason about the size of the bloom filter?

Copy link
Member

@laanwj laanwj Sep 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this render on doxygen? If it is markdown style you'd likely want to quote an inline script using triple-backquotes.

TBH I'm not sure the python script adds much in the first place. Sometimes a simple but good-enough approximation is better for documentation purposes. Is this level of precision necessary?

(edit: whoops, didn't mean to click the resolve conversation button)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, fair question @laanwj . I picked this up from the "up for grabs" #19130 but I could go either way—if we think the extra precision doesn't really help then we can close this PR.

Copy link
Member

@sipa sipa Oct 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment: #19130 (comment)

Given how accurate the formula is in all reasonable cases (at most 0.115% off, and usually much less), I don't think it's worth including the code.

The rest of the PR seems useful, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the input! Agreed that the extra complexity from the Python script is not worthwhile for the small accuracy gain; I'll update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, didn't know about doxygen use before, thanks for the reminder! The formatting should be reasonable now:

Doxygen

src/bloom.h Outdated
@@ -94,7 +94,26 @@ class CBloomFilter
* insert()'ed ... but may also return true for items that were not inserted.
*
* It needs around 1.8 bytes per element per factor 0.1 of false positive rate.
* (More accurately: 3/(log(256)*log(2)) * log(1/fpRate) * nElements bytes)
* e.g. for 1000 elements, we'd need:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Maybe add "Lower tolerance for false positives increases the size of the bloom filter manifesting a trade-off between memory and compute (in the case of a false positive)."

@fanquake fanquake requested review from laanwj and sipa September 29, 2020 12:37
@robot-dreams robot-dreams reopened this Oct 2, 2020
@robot-dreams robot-dreams changed the title doc: make it easier to work out size of bloom filter doc: clarify CRollingBloomFilter size estimate Oct 2, 2020
@laanwj
Copy link
Member

laanwj commented Nov 19, 2020

ACK d9141a0

@laanwj laanwj merged commit e12ad7f into bitcoin:master Nov 19, 2020
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Nov 19, 2020
d9141a0 doc: clarify CRollingBloomFilter size estimate (Anthony Towns)

Pull request description:

  Based on bitcoin#19130, this change improves the comment for `CRollingBloomFilter` in `bloom.h`:

  - Give examples to illustrate the heuristic "1.8 bytes per element per factor 0.1 of false positive rate"
  - Add some Python code which can be copy/pasted for convenient filter size calculation (in an interpreter)
  - Reconcile the newly added code with the existing approximation

ACKs for top commit:
  laanwj:
    ACK d9141a0

Tree-SHA512: e7138b3c531883a750ead06368975c750863fde7ef6f2633b137eca011079226e9205316217322014399fba05a48f294c788dd700bb7d479c58fe1f23e40419f
zkbot added a commit to zcash/zcash that referenced this pull request Mar 5, 2021
Backport bloom filter improvements

Cherry-picked from the following upstream PRs:

- bitcoin/bitcoin#7113
- bitcoin/bitcoin#7818
  - Only the second commit (to resolve conflicts).
- bitcoin/bitcoin#7934
- bitcoin/bitcoin#8655
  - Partial backport to help resolve conflicts.
- bitcoin/bitcoin#9060
- bitcoin/bitcoin#9223
- bitcoin/bitcoin#9644
  - Partial backport to help resolve conflicts.
- bitcoin/bitcoin#9916
- bitcoin/bitcoin#9750
- bitcoin/bitcoin#13176
- bitcoin/bitcoin#13948
- bitcoin/bitcoin#16073
- bitcoin/bitcoin#18670
- bitcoin/bitcoin#18806
  - Reveals upstream's covert fix for CVE-2013-5700.
- bitcoin/bitcoin#19968
zkbot added a commit to zcash/zcash that referenced this pull request Apr 15, 2021
Backport bloom filter improvements

Cherry-picked from the following upstream PRs:

- bitcoin/bitcoin#7113
- bitcoin/bitcoin#7818
  - Only the second commit (to resolve conflicts).
- bitcoin/bitcoin#7934
- bitcoin/bitcoin#8655
  - Partial backport to help resolve conflicts.
- bitcoin/bitcoin#9060
- bitcoin/bitcoin#9223
- bitcoin/bitcoin#9644
  - Partial backport to help resolve conflicts.
- bitcoin/bitcoin#9916
- bitcoin/bitcoin#9750
- bitcoin/bitcoin#13176
- bitcoin/bitcoin#13948
- bitcoin/bitcoin#16073
- bitcoin/bitcoin#18670
- bitcoin/bitcoin#18806
  - Reveals upstream's covert fix for CVE-2013-5700.
- bitcoin/bitcoin#19968
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jun 27, 2021
d9141a0 doc: clarify CRollingBloomFilter size estimate (Anthony Towns)

Pull request description:

  Based on bitcoin#19130, this change improves the comment for `CRollingBloomFilter` in `bloom.h`:

  - Give examples to illustrate the heuristic "1.8 bytes per element per factor 0.1 of false positive rate"
  - Add some Python code which can be copy/pasted for convenient filter size calculation (in an interpreter)
  - Reconcile the newly added code with the existing approximation

ACKs for top commit:
  laanwj:
    ACK d9141a0

Tree-SHA512: e7138b3c531883a750ead06368975c750863fde7ef6f2633b137eca011079226e9205316217322014399fba05a48f294c788dd700bb7d479c58fe1f23e40419f
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jun 28, 2021
d9141a0 doc: clarify CRollingBloomFilter size estimate (Anthony Towns)

Pull request description:

  Based on bitcoin#19130, this change improves the comment for `CRollingBloomFilter` in `bloom.h`:

  - Give examples to illustrate the heuristic "1.8 bytes per element per factor 0.1 of false positive rate"
  - Add some Python code which can be copy/pasted for convenient filter size calculation (in an interpreter)
  - Reconcile the newly added code with the existing approximation

ACKs for top commit:
  laanwj:
    ACK d9141a0

Tree-SHA512: e7138b3c531883a750ead06368975c750863fde7ef6f2633b137eca011079226e9205316217322014399fba05a48f294c788dd700bb7d479c58fe1f23e40419f
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jun 29, 2021
d9141a0 doc: clarify CRollingBloomFilter size estimate (Anthony Towns)

Pull request description:

  Based on bitcoin#19130, this change improves the comment for `CRollingBloomFilter` in `bloom.h`:

  - Give examples to illustrate the heuristic "1.8 bytes per element per factor 0.1 of false positive rate"
  - Add some Python code which can be copy/pasted for convenient filter size calculation (in an interpreter)
  - Reconcile the newly added code with the existing approximation

ACKs for top commit:
  laanwj:
    ACK d9141a0

Tree-SHA512: e7138b3c531883a750ead06368975c750863fde7ef6f2633b137eca011079226e9205316217322014399fba05a48f294c788dd700bb7d479c58fe1f23e40419f
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jul 1, 2021
d9141a0 doc: clarify CRollingBloomFilter size estimate (Anthony Towns)

Pull request description:

  Based on bitcoin#19130, this change improves the comment for `CRollingBloomFilter` in `bloom.h`:

  - Give examples to illustrate the heuristic "1.8 bytes per element per factor 0.1 of false positive rate"
  - Add some Python code which can be copy/pasted for convenient filter size calculation (in an interpreter)
  - Reconcile the newly added code with the existing approximation

ACKs for top commit:
  laanwj:
    ACK d9141a0

Tree-SHA512: e7138b3c531883a750ead06368975c750863fde7ef6f2633b137eca011079226e9205316217322014399fba05a48f294c788dd700bb7d479c58fe1f23e40419f
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jul 1, 2021
d9141a0 doc: clarify CRollingBloomFilter size estimate (Anthony Towns)

Pull request description:

  Based on bitcoin#19130, this change improves the comment for `CRollingBloomFilter` in `bloom.h`:

  - Give examples to illustrate the heuristic "1.8 bytes per element per factor 0.1 of false positive rate"
  - Add some Python code which can be copy/pasted for convenient filter size calculation (in an interpreter)
  - Reconcile the newly added code with the existing approximation

ACKs for top commit:
  laanwj:
    ACK d9141a0

Tree-SHA512: e7138b3c531883a750ead06368975c750863fde7ef6f2633b137eca011079226e9205316217322014399fba05a48f294c788dd700bb7d479c58fe1f23e40419f
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jul 15, 2021
d9141a0 doc: clarify CRollingBloomFilter size estimate (Anthony Towns)

Pull request description:

  Based on bitcoin#19130, this change improves the comment for `CRollingBloomFilter` in `bloom.h`:

  - Give examples to illustrate the heuristic "1.8 bytes per element per factor 0.1 of false positive rate"
  - Add some Python code which can be copy/pasted for convenient filter size calculation (in an interpreter)
  - Reconcile the newly added code with the existing approximation

ACKs for top commit:
  laanwj:
    ACK d9141a0

Tree-SHA512: e7138b3c531883a750ead06368975c750863fde7ef6f2633b137eca011079226e9205316217322014399fba05a48f294c788dd700bb7d479c58fe1f23e40419f
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jul 15, 2021
d9141a0 doc: clarify CRollingBloomFilter size estimate (Anthony Towns)

Pull request description:

  Based on bitcoin#19130, this change improves the comment for `CRollingBloomFilter` in `bloom.h`:

  - Give examples to illustrate the heuristic "1.8 bytes per element per factor 0.1 of false positive rate"
  - Add some Python code which can be copy/pasted for convenient filter size calculation (in an interpreter)
  - Reconcile the newly added code with the existing approximation

ACKs for top commit:
  laanwj:
    ACK d9141a0

Tree-SHA512: e7138b3c531883a750ead06368975c750863fde7ef6f2633b137eca011079226e9205316217322014399fba05a48f294c788dd700bb7d479c58fe1f23e40419f
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jul 16, 2021
d9141a0 doc: clarify CRollingBloomFilter size estimate (Anthony Towns)

Pull request description:

  Based on bitcoin#19130, this change improves the comment for `CRollingBloomFilter` in `bloom.h`:

  - Give examples to illustrate the heuristic "1.8 bytes per element per factor 0.1 of false positive rate"
  - Add some Python code which can be copy/pasted for convenient filter size calculation (in an interpreter)
  - Reconcile the newly added code with the existing approximation

ACKs for top commit:
  laanwj:
    ACK d9141a0

Tree-SHA512: e7138b3c531883a750ead06368975c750863fde7ef6f2633b137eca011079226e9205316217322014399fba05a48f294c788dd700bb7d479c58fe1f23e40419f
gabriel-bjg pushed a commit to gabriel-bjg/dash that referenced this pull request Jul 16, 2021
d9141a0 doc: clarify CRollingBloomFilter size estimate (Anthony Towns)

Pull request description:

  Based on bitcoin#19130, this change improves the comment for `CRollingBloomFilter` in `bloom.h`:

  - Give examples to illustrate the heuristic "1.8 bytes per element per factor 0.1 of false positive rate"
  - Add some Python code which can be copy/pasted for convenient filter size calculation (in an interpreter)
  - Reconcile the newly added code with the existing approximation

ACKs for top commit:
  laanwj:
    ACK d9141a0

Tree-SHA512: e7138b3c531883a750ead06368975c750863fde7ef6f2633b137eca011079226e9205316217322014399fba05a48f294c788dd700bb7d479c58fe1f23e40419f
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants