-
Notifications
You must be signed in to change notification settings - Fork 37.7k
net: Serve blocks directly from disk when possible #13151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c032b99
to
4c790df
Compare
Could you please add a benchmark to |
Sure, though I'm not sure how to do that; none of the benches actually uses (I also don't think it will work on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK.
Couldn't we serve corrupted blocks?
src/validation.cpp
Outdated
|
||
try { | ||
block.resize(nSize); | ||
filein.read((char*)block.data(), nSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to throw out the idea, mmap
wouldn't pay off right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's a win here, as the entire block is read consecutively - could even be slower as it'd have to create and destroy the mapping. Also it's not portable.
Concept ACK. Would be nice to see how much the additional savings are on top of #13098. |
At least it's a lot simpler. |
Yes, that's a possibility, though only if the underlying storage is corrupted. I've posited the idea to add a CRC32C to the on-disk blocks at some point (which is quick to verify, especially with specialized instructions, and should protect against accidental corruptions), but that's quite an invasive change. It's something that could be done later. The only option to verify with the current information would be to do a Merkle tree check - which could be done without deserialization but it's not pretty... (and a serious overhead SHA256-hashing) |
I don't see that we currently do this, so it wouldn't make anything worse here. |
I think serving a corrupted block if our state is corrupted is fine, the peer will just disconnect us and go get the block from someone else, seems pretty harmless! This is a much smaller change than I was expecting-- in particular I forgot there was a size, light review ACK. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK except for the below:
src/validation.cpp
Outdated
} | ||
|
||
try { | ||
block.resize(nSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably want to check the size is sane before we do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. What constant would be appropriate here?
Edit: I'll go with MAX_SIZE from serialize.h.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing I wondered here: what is the C++11 proper way to allocate a vector (or a RAII memory area) without zeroing it? I think that's unnecessary here.
That was a bad idea: even though we handle errors while reading, as this data is sent directly over P2P, zeroing is defense-in-depth against heartbleed-style issues here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will measure some round trips tomorrow.
src/net_processing.cpp
Outdated
} else if (inv.type == MSG_WITNESS_BLOCK) { | ||
// Fast-path: in this case it is possible to serve the block directly from disk, | ||
// as the network format matches the format on disk | ||
LogPrintf("debug: Serving raw block directly from disk: %s\n", pindex->ToString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably remove this debug logging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, definitely. I added it while WIP so that people testing this can be sure that the code actually triggers and they're testing the right thing.
utACK 4c790dff7481d1464a906ad6b17a3179a7da3431 This would probably also speedup an external indexing daemon via p2p (see experiment in https://github.com/jonasschnelli/bitcoincore-indexd [very WIP]) Here a flamegraph of serving the first 200k blocks via p2p localhost (though the real deser/ser starts probably at higher up in the chain). |
src/validation.cpp
Outdated
|
||
CMessageHeader::MessageStartChars msg_start_in; | ||
unsigned int nSize; | ||
filein >> msg_start_in >> nSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this deserialization logic should be within the try {}.
OK, thanks for review everyone, removed WIP tag and pushed commits with the following changes:
Will squash if no further issues.
Yeah - I saw in #13098 that @MarcoFalke had also optimized the zmq full-block notification part. I'm not sure that is any critical path performance-wise (maybe for your indexer?) but if so I'll leave that for a later PR.
Did you forget to post the link? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK after squash.
src/net_processing.cpp
Outdated
// as the network format matches the format on disk | ||
std::vector<uint8_t> block_data; | ||
if (!ReadRawBlockFromDisk(block_data, pindex, chainparams.MessageStart())) | ||
assert(!"cannot load block from disk"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: braces around then-branch if on a separate line.
Thanks for benchmarking @MarcoFalke. I used a patched version of @jonasschnelli's bitcoincore-indexd to benchmark the time for fetching block 0..473600 through P2P, with no processing client-side. The result is:
10% speedup. And in my case is the blocks are on a slow harddisk. I expect the gains to be more significant in case of a faster storage medium, or slower CPU. |
|
||
block.resize(blk_size); // Zeroing of memory is intentional here | ||
filein.read((char*)block.data(), blk_size); | ||
} catch(const std::exception& e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, space after catch } catch (...
.
blk_size, MAX_SIZE); | ||
} | ||
|
||
block.resize(blk_size); // Zeroing of memory is intentional here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Zeroing of memory is intentional
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid heartbleed-type leaks as this data goes directly over the network.
Did 10 rounds of requesting blocks in range 490'000 up to 500'000 on master and got.
Master (
|
@jonasschnelli That's really strange. As reported, I did see some actual speed-ups where using this. |
If someone wants to compare master against this PR built in the same environment: PR: https://bitcoin.jonasschnelli.ch/build/600 |
I think we should definitively look into why it is slower to sync, since that indicates a problem (potentially in our code) exists elsewhere. |
I did the same experiment as @jonasschnelli, a modified
~66% speedup. This is with a spinning rust disk not a SSD. This was done on a different computer than my previous test. |
I think this is a clear benefit for spinning disk and probably also for ssd in non absurd localhost cases. utACK |
@jonasschnelli Did you have a chance to look into why your result was unexpected? |
@MarcoFalke: no. I haven't but I'm willing to do as soon as someone could confirm my results (SSD test). |
Sure, will do
…On Mon, May 14, 2018, 11:41 Jonas Schnelli ***@***.***> wrote:
@MarcoFalke <https://github.com/MarcoFalke>: no. I haven't but I'm
willing to do as soon as someone could confirm my results (SSD test).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#13151 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGGmv9-r1eVW0U-ZpIGMl9OABHCU7k27ks5tyaWYgaJpZM4TvWjI>
.
|
For clarity: 598db means master@598db and 9893e means this pull request. I used @laanwj's branch of bitcoincore-indexd. |
@jonasschnelli I coulnd't find the branch you were using. Mind to share, otherwise I can't reproduce? |
Same experiment as #13151 (comment) on i.MX6Q ARM board w/ USB2 spinning disk:
Seems to help even with a slow CPU and slow I/O. |
In `ProcessGetBlockData`, send the block data directly from disk if type MSG_WITNESS_BLOCK is requested. This is a valid shortcut as the on-disk format matches the network format. This is expected to increase performance because a deserialization and subsequent serialization roundtrip is avoided.
9893e71
to
0bf4318
Compare
squashed, no other changes -- |
I guess my setup was either faulty or there is a performance loss with that particular setup (>1000MB/s IO r&w on very fast CPUs). |
@jonasschnelli I can't explain why, but you might want to try to drop the files cached in memory with |
0bf4318 net: Serve blocks directly from disk when possible (Wladimir J. van der Laan) Pull request description: In `ProcessGetBlockData`, send the block data directly from disk if type MSG_WITNESS_BLOCK is requested. This is a valid shortcut as the on-disk format matches the network format. This is expected to increase performance because a deserialization and subsequent serialization roundtrip is avoided. Tree-SHA512: 9a9500b4c1354eaae1a6f1c6ef2416c1c1985029852589266f3a70e808f6c7482c135e9ab251a527566935378ab7c32dba4ed43ba5451e802d8e72b77d1ba472
0bf4318 net: Serve blocks directly from disk when possible (Wladimir J. van der Laan) Pull request description: In `ProcessGetBlockData`, send the block data directly from disk if type MSG_WITNESS_BLOCK is requested. This is a valid shortcut as the on-disk format matches the network format. This is expected to increase performance because a deserialization and subsequent serialization roundtrip is avoided. Tree-SHA512: 9a9500b4c1354eaae1a6f1c6ef2416c1c1985029852589266f3a70e808f6c7482c135e9ab251a527566935378ab7c32dba4ed43ba5451e802d8e72b77d1ba472 Signed-off-by: Pasta <pasta@dashboost.org> # Conflicts: # src/net_processing.cpp # src/validation.cpp
0bf4318 net: Serve blocks directly from disk when possible (Wladimir J. van der Laan) Pull request description: In `ProcessGetBlockData`, send the block data directly from disk if type MSG_WITNESS_BLOCK is requested. This is a valid shortcut as the on-disk format matches the network format. This is expected to increase performance because a deserialization and subsequent serialization roundtrip is avoided. Tree-SHA512: 9a9500b4c1354eaae1a6f1c6ef2416c1c1985029852589266f3a70e808f6c7482c135e9ab251a527566935378ab7c32dba4ed43ba5451e802d8e72b77d1ba472 Signed-off-by: Pasta <pasta@dashboost.org> # Conflicts: # src/net_processing.cpp # src/validation.cpp
0bf4318 net: Serve blocks directly from disk when possible (Wladimir J. van der Laan) Pull request description: In `ProcessGetBlockData`, send the block data directly from disk if type MSG_WITNESS_BLOCK is requested. This is a valid shortcut as the on-disk format matches the network format. This is expected to increase performance because a deserialization and subsequent serialization roundtrip is avoided. Tree-SHA512: 9a9500b4c1354eaae1a6f1c6ef2416c1c1985029852589266f3a70e808f6c7482c135e9ab251a527566935378ab7c32dba4ed43ba5451e802d8e72b77d1ba472 Signed-off-by: Pasta <pasta@dashboost.org> # Conflicts: # src/net_processing.cpp # src/validation.cpp
0bf4318 net: Serve blocks directly from disk when possible (Wladimir J. van der Laan) Pull request description: In `ProcessGetBlockData`, send the block data directly from disk if type MSG_WITNESS_BLOCK is requested. This is a valid shortcut as the on-disk format matches the network format. This is expected to increase performance because a deserialization and subsequent serialization roundtrip is avoided. Tree-SHA512: 9a9500b4c1354eaae1a6f1c6ef2416c1c1985029852589266f3a70e808f6c7482c135e9ab251a527566935378ab7c32dba4ed43ba5451e802d8e72b77d1ba472 Signed-off-by: Pasta <pasta@dashboost.org> # Conflicts: # src/net_processing.cpp # src/validation.cpp
0bf4318 net: Serve blocks directly from disk when possible (Wladimir J. van der Laan) Pull request description: In `ProcessGetBlockData`, send the block data directly from disk if type MSG_WITNESS_BLOCK is requested. This is a valid shortcut as the on-disk format matches the network format. This is expected to increase performance because a deserialization and subsequent serialization roundtrip is avoided. Tree-SHA512: 9a9500b4c1354eaae1a6f1c6ef2416c1c1985029852589266f3a70e808f6c7482c135e9ab251a527566935378ab7c32dba4ed43ba5451e802d8e72b77d1ba472 Signed-off-by: Pasta <pasta@dashboost.org> # Conflicts: # src/net_processing.cpp # src/validation.cpp
…le (#3408) * Merge bitcoin#13151: net: Serve blocks directly from disk when possible 0bf4318 net: Serve blocks directly from disk when possible (Wladimir J. van der Laan) Pull request description: In `ProcessGetBlockData`, send the block data directly from disk if type MSG_WITNESS_BLOCK is requested. This is a valid shortcut as the on-disk format matches the network format. This is expected to increase performance because a deserialization and subsequent serialization roundtrip is avoided. Tree-SHA512: 9a9500b4c1354eaae1a6f1c6ef2416c1c1985029852589266f3a70e808f6c7482c135e9ab251a527566935378ab7c32dba4ed43ba5451e802d8e72b77d1ba472 Signed-off-by: Pasta <pasta@dashboost.org> # Conflicts: # src/net_processing.cpp # src/validation.cpp * fix extra bracket Signed-off-by: Pasta <pasta@dashboost.org> * remove read raw block from disk functions Signed-off-by: Pasta <pasta@dashboost.org> * correct `if (pblock)` scope Signed-off-by: Pasta <pasta@dashboost.org> Co-authored-by: Wladimir J. van der Laan <laanwj@gmail.com>
In
ProcessGetBlockData
, send the block data directly from disk if type MSG_WITNESS_BLOCK is requested. This is a valid shortcut as the on-disk format matches the network format.This is expected to increase performance because a deserialization and subsequent serialization roundtrip is avoided.