Skip to content

Conversation

pstratem
Copy link
Contributor

@pstratem pstratem commented Jun 15, 2020

What's stored in the block filter index data files is the block_hash and the encoded gcs filter.

If there's a disk error that corrupts the encoded filter there's no way for the node to know.

Clients that use the "header" chain will notice the error though.

It takes 1ms to verify the hash on a filter with 10000 elements.

@pstratem
Copy link
Contributor Author

the Travis failure doesn't seem related

Copy link
Member

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 180e094, did not benchmark the performance cost.

@maflcko
Copy link
Member

maflcko commented Jun 15, 2020

We don't do this when reading blocks from disk and serving them to p2p. Also, with silent corruption on disk, wouldn't it be better to abort the node?

@pstratem pstratem force-pushed the 2020-06-14-blockfilterindex-checksums branch from 180e094 to e13dc3f Compare June 15, 2020 16:18
@pstratem
Copy link
Contributor Author

pstratem commented Jun 15, 2020

@MarcoFalke that's a good point, I've changed it so that we don't run the checks in GCSFilter::GCSFilter when the encoded filter hash has been checked. That check is about twice as expensive as calculating the hash.

@pstratem
Copy link
Contributor Author

I've added the benchmarks I used.

@DrahtBot
Copy link
Contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

pstratem added 3 commits June 15, 2020 19:41
Don't run the (relatively) expensive sanity check in GCSFilter constructor
if we've checked the encoded filter hash.
Also standardize on the BASIC filter parameters so we can compare between all the benchmarks.
Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 75e86b1. If I'm understanding this correctly, it is replacing slower less comprehensive checks that filters contain the right number of elements with faster checks verifying complete hashes of the filters. It also adds a benchmark. So this seems better in every respect.

Copy link
Member

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 75e86b1 per git diff 180e094 75e86b1

Ran comparative benchmarks built with --enable-bench CXXFLAGS="-O2".

This branch rebased on master:

# Benchmark, evals, iterations, total, min, max, median
DecodeCheckedGCSFilter, 100, 50000, 4.28914, 7.30889e-07, 1.02493e-06, 8.58921e-07

DecodeGCSFilter, 10, 5000, 16.2784, 0.000314371, 0.000337642, 0.000329746

BlockFilterGetHash, 10, 10000, 15.4826, 0.000142116, 0.000163642, 0.000158849

ConstructGCSFilter, 10, 1000, 22.988, 0.00217746, 0.00253763, 0.00227277

MatchGCSFilter, 10, 50000, 18.8786, 3.56681e-05, 4.12871e-05, 3.7422e-05

Master, with the standardized BASIC filter params. Ran twice:

# Benchmark, evals, iterations, total, min, max, median
ConstructGCSFilter, 10, 1000, 23.2998, 0.00226351, 0.00248678, 0.00233295
ConstructGCSFilter, 10, 1000, 23.2406, 0.00225238, 0.00251339, 0.00230733

MatchGCSFilter, 10, 50000, 19.3379, 3.70859e-05, 4.26215e-05, 3.85519e-05
MatchGCSFilter, 10, 50000, 18.7122, 3.54355e-05, 3.90555e-05, 3.78147e-05

}
if (!stream.empty()) {
throw std::ios_base::failure("encoded_filter contains excess data");
if (!filter_checked) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3f0be5a If you retouch, returning early might be preferable and a simpler change (feel free to ignore).

+    if (filter_checked) return;
-    if (!filter_checked) {
-       .../...
-       }

@@ -153,7 +154,10 @@ bool BlockFilterIndex::ReadFilterFromDisk(const FlatFilePos& pos, BlockFilter& f
std::vector<unsigned char> encoded_filter;
try {
filein >> block_hash >> encoded_filter;
filter = BlockFilter(GetFilterType(), block_hash, std::move(encoded_filter));
uint256 result;
CHash256().Write(encoded_filter.data(), encoded_filter.size()).Finalize(result.begin());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a future candidate to become a Span.

@DrahtBot
Copy link
Contributor

🐙 This pull request conflicts with the target branch and needs rebase.

Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a "draft".

@DrahtBot
Copy link
Contributor

There hasn't been much activity lately and the patch still needs rebase. What is the status here?
  • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
  • Is it no longer relevant? ➡️ Please close.
  • Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

@maflcko
Copy link
Member

maflcko commented Dec 15, 2021

This had two acks before conflicting. @pstratem are you still working on this or should someone pick this up?

@DrahtBot
Copy link
Contributor

There hasn't been much activity lately and the patch still needs rebase. What is the status here?
  • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
  • Is it no longer relevant? ➡️ Please close.
  • Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

@maflcko maflcko closed this Mar 21, 2022
fanquake added a commit that referenced this pull request Jul 7, 2022
…lter from disk.

e734228 Update GCSFilter benchmarks (Calvin Kim)
aee9a81 Add GCSFilterDecodeSkipCheck benchmark (Patrick Strateman)
299023c Add GCSFilterDecode and GCSBlockFilterGetHash benchmarks. (Patrick Strateman)
b0a53d5 Make sanity check in GCSFilter constructor optional (Patrick Strateman)

Pull request description:

  This PR picks up the abandoned #19280

  BlockFilterIndex was depending on `GolombRiceDecode()` during the filter decode to sanity check that the filter wasn't corrupt. However, we can check for corruption by ensuring that the encoded blockfilter's hash matches up with the one stored in the index database.

  Benchmarks that were added in #19280 showed that checking the hash is much faster.

  The benchmarks were changed to nanobench and the relevant benchmarks were like below, showing a clear win for the hash check method.

  ```
  |             ns/elem |              elem/s |    err% |        ins/elem |       bra/elem |   miss% |     total | benchmark
  |--------------------:|--------------------:|--------:|----------------:|---------------:|--------:|----------:|:----------
  |              531.40 |        1,881,819.43 |    0.3% |        3,527.01 |         411.00 |    0.2% |      0.01 | `DecodeCheckedGCSFilter`
  |          258,220.50 |            3,872.66 |    0.1% |    2,990,092.00 |     586,706.00 |    1.7% |      0.01 | `DecodeGCSFilter`
  |           13,036.77 |           76,706.09 |    0.3% |       64,238.24 |         513.04 |    0.2% |      0.01 | `BlockFilterGetHash`
  ```

ACKs for top commit:
  mzumsande:
    Code Review ACK e734228
  theStack:
    Code-review ACK e734228
  stickies-v:
    ACK e734228
  ryanofsky:
    Code review ACK e734228, with caveat that I mostly paid attention to the main code, not the changes to the benchmark. Only changes since last review were changes to the benchmark code.

Tree-SHA512: 02b86eab7b554e1a57a15b17a4d6d71faa91b556c637b0da29f0c9ee76597a110be8e3b4d0c158d4cab04af0623de18b764837be0ec2a72afcfe1ad9c78a83c6
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jul 8, 2022
… the filter from disk.

e734228 Update GCSFilter benchmarks (Calvin Kim)
aee9a81 Add GCSFilterDecodeSkipCheck benchmark (Patrick Strateman)
299023c Add GCSFilterDecode and GCSBlockFilterGetHash benchmarks. (Patrick Strateman)
b0a53d5 Make sanity check in GCSFilter constructor optional (Patrick Strateman)

Pull request description:

  This PR picks up the abandoned bitcoin#19280

  BlockFilterIndex was depending on `GolombRiceDecode()` during the filter decode to sanity check that the filter wasn't corrupt. However, we can check for corruption by ensuring that the encoded blockfilter's hash matches up with the one stored in the index database.

  Benchmarks that were added in bitcoin#19280 showed that checking the hash is much faster.

  The benchmarks were changed to nanobench and the relevant benchmarks were like below, showing a clear win for the hash check method.

  ```
  |             ns/elem |              elem/s |    err% |        ins/elem |       bra/elem |   miss% |     total | benchmark
  |--------------------:|--------------------:|--------:|----------------:|---------------:|--------:|----------:|:----------
  |              531.40 |        1,881,819.43 |    0.3% |        3,527.01 |         411.00 |    0.2% |      0.01 | `DecodeCheckedGCSFilter`
  |          258,220.50 |            3,872.66 |    0.1% |    2,990,092.00 |     586,706.00 |    1.7% |      0.01 | `DecodeGCSFilter`
  |           13,036.77 |           76,706.09 |    0.3% |       64,238.24 |         513.04 |    0.2% |      0.01 | `BlockFilterGetHash`
  ```

ACKs for top commit:
  mzumsande:
    Code Review ACK e734228
  theStack:
    Code-review ACK e734228
  stickies-v:
    ACK e734228
  ryanofsky:
    Code review ACK e734228, with caveat that I mostly paid attention to the main code, not the changes to the benchmark. Only changes since last review were changes to the benchmark code.

Tree-SHA512: 02b86eab7b554e1a57a15b17a4d6d71faa91b556c637b0da29f0c9ee76597a110be8e3b4d0c158d4cab04af0623de18b764837be0ec2a72afcfe1ad9c78a83c6
@bitcoin bitcoin locked and limited conversation to collaborators Apr 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants