-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Verify the block filter hash when reading from disk. #19280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Verify the block filter hash when reading from disk. #19280
Conversation
the Travis failure doesn't seem related |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 180e094, did not benchmark the performance cost.
We don't do this when reading blocks from disk and serving them to p2p. Also, with silent corruption on disk, wouldn't it be better to abort the node? |
180e094
to
e13dc3f
Compare
@MarcoFalke that's a good point, I've changed it so that we don't run the checks in GCSFilter::GCSFilter when the encoded filter hash has been checked. That check is about twice as expensive as calculating the hash. |
I've added the benchmarks I used. |
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
Don't run the (relatively) expensive sanity check in GCSFilter constructor if we've checked the encoded filter hash.
Also standardize on the BASIC filter parameters so we can compare between all the benchmarks.
6257a66
to
75e86b1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 75e86b1. If I'm understanding this correctly, it is replacing slower less comprehensive checks that filters contain the right number of elements with faster checks verifying complete hashes of the filters. It also adds a benchmark. So this seems better in every respect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 75e86b1 per git diff 180e094 75e86b1
Ran comparative benchmarks built with --enable-bench CXXFLAGS="-O2"
.
This branch rebased on master:
# Benchmark, evals, iterations, total, min, max, median
DecodeCheckedGCSFilter, 100, 50000, 4.28914, 7.30889e-07, 1.02493e-06, 8.58921e-07
DecodeGCSFilter, 10, 5000, 16.2784, 0.000314371, 0.000337642, 0.000329746
BlockFilterGetHash, 10, 10000, 15.4826, 0.000142116, 0.000163642, 0.000158849
ConstructGCSFilter, 10, 1000, 22.988, 0.00217746, 0.00253763, 0.00227277
MatchGCSFilter, 10, 50000, 18.8786, 3.56681e-05, 4.12871e-05, 3.7422e-05
Master, with the standardized BASIC filter params. Ran twice:
# Benchmark, evals, iterations, total, min, max, median
ConstructGCSFilter, 10, 1000, 23.2998, 0.00226351, 0.00248678, 0.00233295
ConstructGCSFilter, 10, 1000, 23.2406, 0.00225238, 0.00251339, 0.00230733
MatchGCSFilter, 10, 50000, 19.3379, 3.70859e-05, 4.26215e-05, 3.85519e-05
MatchGCSFilter, 10, 50000, 18.7122, 3.54355e-05, 3.90555e-05, 3.78147e-05
} | ||
if (!stream.empty()) { | ||
throw std::ios_base::failure("encoded_filter contains excess data"); | ||
if (!filter_checked) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3f0be5a If you retouch, returning early might be preferable and a simpler change (feel free to ignore).
+ if (filter_checked) return;
- if (!filter_checked) {
- .../...
- }
@@ -153,7 +154,10 @@ bool BlockFilterIndex::ReadFilterFromDisk(const FlatFilePos& pos, BlockFilter& f | |||
std::vector<unsigned char> encoded_filter; | |||
try { | |||
filein >> block_hash >> encoded_filter; | |||
filter = BlockFilter(GetFilterType(), block_hash, std::move(encoded_filter)); | |||
uint256 result; | |||
CHash256().Write(encoded_filter.data(), encoded_filter.size()).Finalize(result.begin()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a future candidate to become a Span
.
🐙 This pull request conflicts with the target branch and needs rebase. Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a "draft". |
There hasn't been much activity lately and the patch still needs rebase. What is the status here?
|
This had two acks before conflicting. @pstratem are you still working on this or should someone pick this up? |
There hasn't been much activity lately and the patch still needs rebase. What is the status here?
|
…lter from disk. e734228 Update GCSFilter benchmarks (Calvin Kim) aee9a81 Add GCSFilterDecodeSkipCheck benchmark (Patrick Strateman) 299023c Add GCSFilterDecode and GCSBlockFilterGetHash benchmarks. (Patrick Strateman) b0a53d5 Make sanity check in GCSFilter constructor optional (Patrick Strateman) Pull request description: This PR picks up the abandoned #19280 BlockFilterIndex was depending on `GolombRiceDecode()` during the filter decode to sanity check that the filter wasn't corrupt. However, we can check for corruption by ensuring that the encoded blockfilter's hash matches up with the one stored in the index database. Benchmarks that were added in #19280 showed that checking the hash is much faster. The benchmarks were changed to nanobench and the relevant benchmarks were like below, showing a clear win for the hash check method. ``` | ns/elem | elem/s | err% | ins/elem | bra/elem | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|---------------:|--------:|----------:|:---------- | 531.40 | 1,881,819.43 | 0.3% | 3,527.01 | 411.00 | 0.2% | 0.01 | `DecodeCheckedGCSFilter` | 258,220.50 | 3,872.66 | 0.1% | 2,990,092.00 | 586,706.00 | 1.7% | 0.01 | `DecodeGCSFilter` | 13,036.77 | 76,706.09 | 0.3% | 64,238.24 | 513.04 | 0.2% | 0.01 | `BlockFilterGetHash` ``` ACKs for top commit: mzumsande: Code Review ACK e734228 theStack: Code-review ACK e734228 stickies-v: ACK e734228 ryanofsky: Code review ACK e734228, with caveat that I mostly paid attention to the main code, not the changes to the benchmark. Only changes since last review were changes to the benchmark code. Tree-SHA512: 02b86eab7b554e1a57a15b17a4d6d71faa91b556c637b0da29f0c9ee76597a110be8e3b4d0c158d4cab04af0623de18b764837be0ec2a72afcfe1ad9c78a83c6
… the filter from disk. e734228 Update GCSFilter benchmarks (Calvin Kim) aee9a81 Add GCSFilterDecodeSkipCheck benchmark (Patrick Strateman) 299023c Add GCSFilterDecode and GCSBlockFilterGetHash benchmarks. (Patrick Strateman) b0a53d5 Make sanity check in GCSFilter constructor optional (Patrick Strateman) Pull request description: This PR picks up the abandoned bitcoin#19280 BlockFilterIndex was depending on `GolombRiceDecode()` during the filter decode to sanity check that the filter wasn't corrupt. However, we can check for corruption by ensuring that the encoded blockfilter's hash matches up with the one stored in the index database. Benchmarks that were added in bitcoin#19280 showed that checking the hash is much faster. The benchmarks were changed to nanobench and the relevant benchmarks were like below, showing a clear win for the hash check method. ``` | ns/elem | elem/s | err% | ins/elem | bra/elem | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|---------------:|--------:|----------:|:---------- | 531.40 | 1,881,819.43 | 0.3% | 3,527.01 | 411.00 | 0.2% | 0.01 | `DecodeCheckedGCSFilter` | 258,220.50 | 3,872.66 | 0.1% | 2,990,092.00 | 586,706.00 | 1.7% | 0.01 | `DecodeGCSFilter` | 13,036.77 | 76,706.09 | 0.3% | 64,238.24 | 513.04 | 0.2% | 0.01 | `BlockFilterGetHash` ``` ACKs for top commit: mzumsande: Code Review ACK e734228 theStack: Code-review ACK e734228 stickies-v: ACK e734228 ryanofsky: Code review ACK e734228, with caveat that I mostly paid attention to the main code, not the changes to the benchmark. Only changes since last review were changes to the benchmark code. Tree-SHA512: 02b86eab7b554e1a57a15b17a4d6d71faa91b556c637b0da29f0c9ee76597a110be8e3b4d0c158d4cab04af0623de18b764837be0ec2a72afcfe1ad9c78a83c6
What's stored in the block filter index data files is the block_hash and the encoded gcs filter.
If there's a disk error that corrupts the encoded filter there's no way for the node to know.
Clients that use the "header" chain will notice the error though.
It takes 1ms to verify the hash on a filter with 10000 elements.