-
Notifications
You must be signed in to change notification settings - Fork 37.7k
RPC/Blockchain: Optimise getblock for simple disk->hex case #21319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d30f7d9
to
33cead5
Compare
33cead5
to
7b23e32
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some notes and context for others looking at this PR:
The getblock <hash>
has different levels of verbosity. A verbosity of 0
returns the block in hex, verbosity 1
in JSON, and further verbosity levels add more transaction details to the JSON.
When calling the getblock <hash>
RPC with a verbosity of 0
we currently (before this PR) de-serialize the raw block stored on disk into a CBlock
(happens in GetBlockChecked()
) and then serialize it back into a hex string. This isn't ideal performance wise (block and tx are de-serialized and (I think) all transactions are hashed in CTransaction
initialization).
This PR adds special handling to the verbosity 0 case to skip the CBlock
de- and serialization. This works only if the block on disk is serialized the same way as the RPC serialization is configured. If you were to manually set -rpcserialversion=0
(currently version 1 (SegWit)
by default), then special case wouldn't work.
Something very similar has been implemented in #13151 for P2P block serving. @MarcoFalke opened PR #17529 attempting to improve the performance (10%-25% on Intel CPUs) by leaving out the CTransaction hashing where not needed.
Comparison:
master: disk -> deserialize -> hash -> serialize -> hex string
#17529: disk -> deserialize -> ____ -> serialize -> hex string
this PR: disk -> ___________ -> ____ -> _________ -> hex string
As @promag mentioned in #13151 (review) in regards to P2P block serving, skipping the de- and serialization here could potentially feed the RPC client corrupted blocks if the data on the disk is corrupted. Can't really judge how problematic that is for RPC.
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
Closing due to lack of interest. Can reopen if people want to review. |
7b23e32
to
43882cf
Compare
Concept ACK. This RPC is used by a lot of projects and optimizing it would be a big benefit. I think the main problem though is serving potentially corrupted blocks. It was suggested in #13151 (comment) that CRC32Cs can be added to the on disk blocks, which I think would be sufficient. How to do that though I'm not sure of. Ideally they would be in the meta header of the on disk block. Adding them to every block on disk would require rewriting the entire blockchain on already synced nodes, which would take a very long time and not allow downgrading. We would also need to update code that assumes a fixed 8 byte header before the block. Maybe we could add a block file version and only do this for blocks added to the new version type. Alternatively it could be added to the block index. That would take less time to upgrade but would require a migration that would not allow downgrading. I tried adding CRCs as a new index type in #26415, but that requires storing all the 4 byte CRCs along with the 32 byte block hash so it's a lot more data on disk. It also adds the CRCs after the block is stored on the scheduler thread, so potentially a system that got notified of the new block and called |
43882cf
to
6822667
Compare
I don't think we need CRCs to make this change. The code already assumes no corruption in lots of other places. Rebased |
The only other place using |
What happens if you corrupt the chainstate? |
Chainstate is separate from blocksdir. They can live on a different storage device. If an old block file gets corrupted it doesn't affect the chainstate. |
Nevertheless, I don't think we guarantee anything in the case of disk corruption, nor should or can we. |
Well, I think currently we do guarantee we won't serve a corrupt block when we use |
🐙 This pull request conflicts with the target branch and needs rebase. |
Do we? The block header is checked, but not the transaction payload itself |
Indeed, I tested by modifying https://github.com/bitcoin/bitcoin/pull/26415/files#diff-186034bc1b6a0e662a59c81182b3d4a0318d44dc28810d2b837c92033e0d5630R142 in my PR to use |
I don't think there's a benefit to checking the header, or trying to plan for corrupt data in general (aside from startup verification of high-risk/recent data). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
|
||
if (verbosity <= 0 && !RPCSerializationFlags()) { | ||
// This one case doesn't need to parse the block at all | ||
if (!node::ReadRawBlockFromDisk(raw_block, pblockindex->GetBlockPos(), Params().MessageStart())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Maybe move this into a GetRawBlockChecked to avoid the duplicate pruned check when GetBlockChecked
below is called?
Still needs a rebase. Maybe someone wants to take this over? |
I can add this functionality to #26415. |
@andrewtoth ok. Will close this for now. |
1c4b9cb bench: add readblock benchmark (Andrew Toth) Pull request description: Requested in bitcoin/bitcoin#13151 (comment). See bitcoin/bitcoin#26415 and bitcoin/bitcoin#21319. Benchmarking shows a >50x increase in speed on both nvme and spinning disk. Benchmark results: | ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 5,377,375.00 | 185.96 | 0.2% | 60,125,513.00 | 11,633,676.00 | 5.168 | 3,588,800.00 | 0.4% | 0.09 | `ReadBlockFromDiskTest` | ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 89,945.58 | 11,117.83 | 0.7% | 12,743.90 | 64,530.33 | 0.197 | 2,595.20 | 0.2% | 0.01 | `ReadRawBlockFromDiskTest` ACKs for top commit: maflcko: lgtm ACK 1c4b9cb achow101: ACK 1c4b9cb TheCharlatan: ACK 1c4b9cb Tree-SHA512: 71dbcd6c7e2be97eb3001e35d0a95ef8e0c9b10dc9193025c7f8e11a09017fa2fbf89489b686353cd88fb409fb729fe2c4a25c567d2988f64c9c164ab09fba9f
Implemented #17529 (comment)