Skip to content

Conversation

luke-jr
Copy link
Member

@luke-jr luke-jr commented Feb 28, 2021

Implemented #17529 (comment)

@luke-jr luke-jr force-pushed the getblock_optimise branch from 33cead5 to 7b23e32 Compare March 3, 2021 23:37
Copy link
Contributor

@0xB10C 0xB10C left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes and context for others looking at this PR:

The getblock <hash> has different levels of verbosity. A verbosity of 0 returns the block in hex, verbosity 1 in JSON, and further verbosity levels add more transaction details to the JSON.

When calling the getblock <hash> RPC with a verbosity of 0 we currently (before this PR) de-serialize the raw block stored on disk into a CBlock (happens in GetBlockChecked()) and then serialize it back into a hex string. This isn't ideal performance wise (block and tx are de-serialized and (I think) all transactions are hashed in CTransaction initialization).

This PR adds special handling to the verbosity 0 case to skip the CBlock de- and serialization. This works only if the block on disk is serialized the same way as the RPC serialization is configured. If you were to manually set -rpcserialversion=0 (currently version 1 (SegWit) by default), then special case wouldn't work.

Something very similar has been implemented in #13151 for P2P block serving. @MarcoFalke opened PR #17529 attempting to improve the performance (10%-25% on Intel CPUs) by leaving out the CTransaction hashing where not needed.

Comparison:

master:   disk -> deserialize -> hash -> serialize -> hex string
#17529:   disk -> deserialize -> ____ -> serialize -> hex string
this PR:  disk -> ___________ -> ____ -> _________ -> hex string

As @promag mentioned in #13151 (review) in regards to P2P block serving, skipping the de- and serialization here could potentially feed the RPC client corrupted blocks if the data on the disk is corrupted. Can't really judge how problematic that is for RPC.

@DrahtBot
Copy link
Contributor

DrahtBot commented Jul 28, 2021

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK andrewtoth

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #26415 (rpc,rest: faster getblock and rest_block by reading raw block by andrewtoth)
  • #25232 (rpc: Faster getblock API by AaronDewes)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@luke-jr
Copy link
Member Author

luke-jr commented Jul 28, 2021

Closing due to lack of interest. Can reopen if people want to review.

@luke-jr luke-jr closed this Jul 28, 2021
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Aug 18, 2022
@bitcoin bitcoin unlocked this conversation Oct 30, 2022
@luke-jr luke-jr reopened this Nov 5, 2022
@andrewtoth
Copy link
Contributor

Concept ACK. This RPC is used by a lot of projects and optimizing it would be a big benefit.

I think the main problem though is serving potentially corrupted blocks. It was suggested in #13151 (comment) that CRC32Cs can be added to the on disk blocks, which I think would be sufficient. How to do that though I'm not sure of. Ideally they would be in the meta header of the on disk block. Adding them to every block on disk would require rewriting the entire blockchain on already synced nodes, which would take a very long time and not allow downgrading. We would also need to update code that assumes a fixed 8 byte header before the block. Maybe we could add a block file version and only do this for blocks added to the new version type.

Alternatively it could be added to the block index. That would take less time to upgrade but would require a migration that would not allow downgrading.

I tried adding CRCs as a new index type in #26415, but that requires storing all the 4 byte CRCs along with the 32 byte block hash so it's a lot more data on disk. It also adds the CRCs after the block is stored on the scheduler thread, so potentially a system that got notified of the new block and called getblock wouldn't have the CRC yet.

@luke-jr
Copy link
Member Author

luke-jr commented Nov 5, 2022

I don't think we need CRCs to make this change. The code already assumes no corruption in lots of other places.

Rebased

@andrewtoth
Copy link
Contributor

The only other place using ReadRawBlockFromDisk is in net processing, where we assume the peer will validate correctly and disconnect. The same assumption can't be made for consumers of the RPC. All other block reads deserialize the block which will fail if the data is corrupt.

@luke-jr
Copy link
Member Author

luke-jr commented Nov 5, 2022

What happens if you corrupt the chainstate?

@andrewtoth
Copy link
Contributor

Chainstate is separate from blocksdir. They can live on a different storage device. If an old block file gets corrupted it doesn't affect the chainstate.

@luke-jr
Copy link
Member Author

luke-jr commented Nov 6, 2022

Nevertheless, I don't think we guarantee anything in the case of disk corruption, nor should or can we.

@andrewtoth
Copy link
Contributor

Well, I think currently we do guarantee we won't serve a corrupt block when we use ReadBlockFromDisk.

@DrahtBot
Copy link
Contributor

DrahtBot commented Dec 8, 2022

🐙 This pull request conflicts with the target branch and needs rebase.

@maflcko
Copy link
Member

maflcko commented Dec 8, 2022

Well, I think currently we do guarantee we won't serve a corrupt block when we use ReadBlockFromDisk.

Do we?

The block header is checked, but not the transaction payload itself

@andrewtoth
Copy link
Contributor

andrewtoth commented Dec 8, 2022

The block header is checked, but not the transaction payload itself

Indeed, I tested by modifying https://github.com/bitcoin/bitcoin/pull/26415/files#diff-186034bc1b6a0e662a59c81182b3d4a0318d44dc28810d2b837c92033e0d5630R142 in my PR to use ReadBlockFromDisk and it passes, but if I switch it to a byte inside the header (like 50), it fails. Looks like I was mistaken. The risk is just limited to the header then. Would it be worth it to modify ReadRawBlockFromDIsk to check the header hash at least then, or does that remove enough of the speed benefit here?

@luke-jr
Copy link
Member Author

luke-jr commented Dec 11, 2022

I don't think there's a benefit to checking the header, or trying to plan for corrupt data in general (aside from startup verification of high-risk/recent data).

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm


if (verbosity <= 0 && !RPCSerializationFlags()) {
// This one case doesn't need to parse the block at all
if (!node::ReadRawBlockFromDisk(raw_block, pblockindex->GetBlockPos(), Params().MessageStart())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe move this into a GetRawBlockChecked to avoid the duplicate pruned check when GetBlockChecked below is called?

@fanquake
Copy link
Member

fanquake commented Feb 6, 2023

Still needs a rebase. Maybe someone wants to take this over?

@andrewtoth
Copy link
Contributor

I can add this functionality to #26415.

@fanquake
Copy link
Member

fanquake commented Feb 6, 2023

I can add this functionality to #26415.

@andrewtoth ok. Will close this for now.

@fanquake fanquake closed this Feb 6, 2023
achow101 added a commit to bitcoin-core/gui that referenced this pull request Jan 2, 2024
1c4b9cb bench: add readblock benchmark (Andrew Toth)

Pull request description:

  Requested in bitcoin/bitcoin#13151 (comment).
  See bitcoin/bitcoin#26415 and bitcoin/bitcoin#21319.

  Benchmarking shows a >50x increase in speed on both nvme and spinning disk.

  Benchmark results:
  |               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
  |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
  |        5,377,375.00 |              185.96 |    0.2% |   60,125,513.00 |   11,633,676.00 |  5.168 |   3,588,800.00 |    0.4% |      0.09 | `ReadBlockFromDiskTest`

  |               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
  |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
  |           89,945.58 |           11,117.83 |    0.7% |       12,743.90 |       64,530.33 |  0.197 |       2,595.20 |    0.2% |      0.01 | `ReadRawBlockFromDiskTest`

ACKs for top commit:
  maflcko:
    lgtm ACK 1c4b9cb
  achow101:
    ACK 1c4b9cb
  TheCharlatan:
    ACK 1c4b9cb

Tree-SHA512: 71dbcd6c7e2be97eb3001e35d0a95ef8e0c9b10dc9193025c7f8e11a09017fa2fbf89489b686353cd88fb409fb729fe2c4a25c567d2988f64c9c164ab09fba9f
@bitcoin bitcoin locked and limited conversation to collaborators Feb 6, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants