net: Serve blocks directly from disk when possible #13151

laanwj · 2018-05-02T12:17:26Z

In ProcessGetBlockData, send the block data directly from disk if type MSG_WITNESS_BLOCK is requested. This is a valid shortcut as the on-disk format matches the network format.

This is expected to increase performance because a deserialization and subsequent serialization roundtrip is avoided.

maflcko · 2018-05-02T12:34:45Z

Could you please add a benchmark to ./src/bench/checkblock.cpp, so it is easier to see how much this improves?

laanwj · 2018-05-02T12:38:10Z

Sure, though I'm not sure how to do that; none of the benches actually uses ReadBlockFromDisk, I would have to set up a fake block index or such.

(I also don't think it will work on block413567 as-is because it has no magic/size header, and is not a file on disk, though it's easy enough to write a temporary file of course). [When doing this from memory there's effectively nothing to benchmark, too.]

promag

Concept ACK.

Couldn't we serve corrupted blocks?

promag · 2018-05-02T14:00:40Z

src/validation.cpp

+
+    try {
+        block.resize(nSize);
+        filein.read((char*)block.data(), nSize);


Just to throw out the idea, mmap wouldn't pay off right?

I don't think that's a win here, as the entire block is read consecutively - could even be slower as it'd have to create and destroy the mapping. Also it's not portable.

src/net_processing.cpp

maflcko · 2018-05-02T14:24:32Z

Concept ACK. Would be nice to see how much the additional savings are on top of #13098.

laanwj · 2018-05-02T14:28:32Z

Concept ACK. Would be nice to see how much the additional savings are on top of #13098.

At least it's a lot simpler.

laanwj · 2018-05-02T16:25:11Z

Couldn't we serve corrupted blocks?

Yes, that's a possibility, though only if the underlying storage is corrupted. I've posited the idea to add a CRC32C to the on-disk blocks at some point (which is quick to verify, especially with specialized instructions, and should protect against accidental corruptions), but that's quite an invasive change. It's something that could be done later.

The only option to verify with the current information would be to do a Merkle tree check - which could be done without deserialization but it's not pretty... (and a serious overhead SHA256-hashing)

maflcko · 2018-05-02T16:46:20Z

The only option to verify with the current information would be to do a Merkle tree check

I don't see that we currently do this, so it wouldn't make anything worse here.

gmaxwell · 2018-05-02T17:53:29Z

I think serving a corrupted block if our state is corrupted is fine, the peer will just disconnect us and go get the block from someone else, seems pretty harmless!

This is a much smaller change than I was expecting-- in particular I forgot there was a size, light review ACK.

TheBlueMatt

utACK except for the below:

TheBlueMatt · 2018-05-02T19:49:45Z

src/validation.cpp

+    }
+
+    try {
+        block.resize(nSize);


Probably want to check the size is sane before we do this.

Good point. What constant would be appropriate here?
Edit: I'll go with MAX_SIZE from serialize.h.

~~Another thing I wondered here: what is the C++11 proper way to allocate a vector (or a RAII memory area) without zeroing it? I think that's unnecessary here.~~
That was a bad idea: even though we handle errors while reading, as this data is sent directly over P2P, zeroing is defense-in-depth against heartbleed-style issues here

maflcko

Will measure some round trips tomorrow.

maflcko · 2018-05-02T20:20:13Z

src/net_processing.cpp

+        } else if (inv.type == MSG_WITNESS_BLOCK) {
+            // Fast-path: in this case it is possible to serve the block directly from disk,
+            // as the network format matches the format on disk
+            LogPrintf("debug: Serving raw block directly from disk: %s\n", pindex->ToString());


Should probably remove this debug logging

Yes, definitely. I added it while WIP so that people testing this can be sure that the code actually triggers and they're testing the right thing.

src/net_processing.cpp

jonasschnelli · 2018-05-03T06:44:11Z

utACK 4c790dff7481d1464a906ad6b17a3179a7da3431

This would probably also speedup an external indexing daemon via p2p (see experiment in https://github.com/jonasschnelli/bitcoincore-indexd [very WIP])

Here a flamegraph of serving the first 200k blocks via p2p localhost (though the real deser/ser starts probably at higher up in the chain).

laanwj · 2018-05-03T08:25:23Z

src/validation.cpp

+
+    CMessageHeader::MessageStartChars msg_start_in;
+    unsigned int nSize;
+    filein >> msg_start_in >> nSize;


I guess this deserialization logic should be within the try {}.

laanwj · 2018-05-03T08:49:25Z

OK, thanks for review everyone, removed WIP tag and pushed commits with the following changes:

Remove extraneous debug log message
Check nSize against MAX_SIZE
Move deserialization of msg_start_in, and size into exception try
Improved variable and parameter naming

Will squash if no further issues.

This would probably also speedup an external indexing daemon via p2p (see experiment in https://github.com/jonasschnelli/bitcoincore-indexd [very WIP])

Yeah - I saw in #13098 that @MarcoFalke had also optimized the zmq full-block notification part. I'm not sure that is any critical path performance-wise (maybe for your indexer?) but if so I'll leave that for a later PR.

Here a flamegraph of serving the first 200k blocks via p2p localhost (though the real deser/ser starts probably at higher up in the chain).

Did you forget to post the link?

sipa

utACK after squash.

sipa · 2018-05-03T23:25:33Z

src/net_processing.cpp

+            // as the network format matches the format on disk
+            std::vector<uint8_t> block_data;
+            if (!ReadRawBlockFromDisk(block_data, pindex, chainparams.MessageStart()))
+                assert(!"cannot load block from disk");


Nit: braces around then-branch if on a separate line.

maflcko · 2018-05-04T00:33:55Z

Running with e0223ebf0c58f7beedea91df48e9586154cd4436 and just looking at the wall clock time for reading+optional deserialization shows for me on an ssd:

Edit: Note that this was done with full fake blocks and not real blocks from the network.

laanwj · 2018-05-04T15:53:25Z

Thanks for benchmarking @MarcoFalke.

I used a patched version of @jonasschnelli's bitcoincore-indexd to benchmark the time for fetching block 0..473600 through P2P, with no processing client-side. The result is:

With patch:
real    63m51.273s

Without patch:
real    70m28.956s

10% speedup. And in my case is the blocks are on a slow harddisk. I expect the gains to be more significant in case of a faster storage medium, or slower CPU.

promag · 2018-05-07T13:39:41Z

src/validation.cpp

+
+        block.resize(blk_size); // Zeroing of memory is intentional here
+        filein.read((char*)block.data(), blk_size);
+    } catch(const std::exception& e) {


nit, space after catch } catch (....

promag · 2018-05-07T13:43:20Z

src/validation.cpp

+                    blk_size, MAX_SIZE);
+        }
+
+        block.resize(blk_size); // Zeroing of memory is intentional here


Zeroing of memory is intentional

Why?

To avoid heartbleed-type leaks as this data goes directly over the network.

jonasschnelli · 2018-05-07T15:10:18Z

Did 10 rounds of requesting blocks in range 490'000 up to 500'000 on master and got.
Setup:

Non VM machine
SSD 1400MB/s
Intel(R) Xeon(R) CPU E3-1275 v5 @ 3.60GHz
txindex was enabled
connect=0 --whitebind=127.0.0.1:8333
no other resource intense applications where running on that system
used a modified version of https://github.com/laanwj/bitcoincore-indexd

Master (`-g -O2`):

95211ms in avg. (all rounds where very similar and there was no need to exclude the first round)
exemption

This PR (`-g O2`):

101051ms in avg.

I can't figure out why this PR performs ~6% slower and I double checked the comparison by manually rolling back from the head of this PR to 598db38 and back to head. Also added the LogPrintfs back to ensure I'm using the "fast track" (removed again during benchmarking).

sidenode: found out that -debug=net (which was disabled during the rounds reported above) account for a 3% slowdown for the above test-scenario.

laanwj · 2018-05-09T05:35:34Z

@jonasschnelli That's really strange. As reported, I did see some actual speed-ups where using this.
Maybe someone else can try some measurements, or we should just close, I don't know.

jonasschnelli · 2018-05-09T08:47:31Z

If someone wants to compare master against this PR built in the same environment:

PR: https://bitcoin.jonasschnelli.ch/build/600
master: https://bitcoin.jonasschnelli.ch/build/599

maflcko · 2018-05-09T18:12:05Z

I updated my benchmark to also include the time it takes to Make (serialize) the net message:

maflcko · 2018-05-09T18:15:29Z

I think we should definitively look into why it is slower to sync, since that indicates a problem (potentially in our code) exists elsewhere.

laanwj · 2018-05-13T17:45:40Z

I did the same experiment as @jonasschnelli, a modified bitcoincore-indexd that requests block 490000..500000 (https://github.com/laanwj/bitcoincore-indexd/tree/bench). Tried both cases 5 times;

with patch:
real    0m55.928s
real    0m55.986s
real    0m55.913s
real    0m55.844s
real    0m55.790s

without patch (using the commit before):
real    2m47.673s
real    2m46.329s
real    2m46.634s
real    2m46.413s
real    2m46.458s

~66% speedup. This is with a spinning rust disk not a SSD. This was done on a different computer than my previous test.

jonasschnelli · 2018-05-14T15:19:01Z

I think this is a clear benefit for spinning disk and probably also for ssd in non absurd localhost cases.

utACK

maflcko · 2018-05-14T15:27:19Z

@jonasschnelli Did you have a chance to look into why your result was unexpected?

jonasschnelli · 2018-05-14T15:40:19Z

@MarcoFalke: no. I haven't but I'm willing to do as soon as someone could confirm my results (SSD test).

maflcko · 2018-05-14T16:04:55Z

Sure, will do

…

On Mon, May 14, 2018, 11:41 Jonas Schnelli ***@***.***> wrote: @MarcoFalke <https://github.com/MarcoFalke>: no. I haven't but I'm willing to do as soon as someone could confirm my results (SSD test). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#13151 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGGmv9-r1eVW0U-ZpIGMl9OABHCU7k27ks5tyaWYgaJpZM4TvWjI> .

maflcko · 2018-05-14T21:04:56Z

For clarity: 598db means master@598db and 9893e means this pull request. I used @laanwj's branch of bitcoincore-indexd.

maflcko · 2018-05-14T21:06:28Z

@jonasschnelli I coulnd't find the branch you were using. Mind to share, otherwise I can't reproduce?

laanwj · 2018-05-15T06:05:18Z

Same experiment as #13151 (comment) on i.MX6Q ARM board w/ USB2 spinning disk:

with patch:
real    10m18.368s
real    11m14.600s
real    10m12.006s
real    10m21.668s
real    10m11.070s

without patch (using the commit before):
real    27m30.574s
real    26m27.591s
real    25m38.311s
real    25m36.661s
real    25m40.902s

Seems to help even with a slow CPU and slow I/O.

In `ProcessGetBlockData`, send the block data directly from disk if type MSG_WITNESS_BLOCK is requested. This is a valid shortcut as the on-disk format matches the network format. This is expected to increase performance because a deserialization and subsequent serialization roundtrip is avoided.

laanwj · 2018-05-15T06:13:48Z

squashed, no other changes --
9893e712e9e04e8b9478e36e0b5d843899540bd2 → 0bf4318

jonasschnelli · 2018-05-15T18:23:09Z

I guess my setup was either faulty or there is a performance loss with that particular setup (>1000MB/s IO r&w on very fast CPUs).
However, this PR is a clear and significant win!

maflcko · 2018-05-15T18:57:47Z

@jonasschnelli I can't explain why, but you might want to try to drop the files cached in memory with sync && sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'. For me this was speeding up the sync on an ssd.

0bf4318 net: Serve blocks directly from disk when possible (Wladimir J. van der Laan) Pull request description: In `ProcessGetBlockData`, send the block data directly from disk if type MSG_WITNESS_BLOCK is requested. This is a valid shortcut as the on-disk format matches the network format. This is expected to increase performance because a deserialization and subsequent serialization roundtrip is avoided. Tree-SHA512: 9a9500b4c1354eaae1a6f1c6ef2416c1c1985029852589266f3a70e808f6c7482c135e9ab251a527566935378ab7c32dba4ed43ba5451e802d8e72b77d1ba472

0bf4318 net: Serve blocks directly from disk when possible (Wladimir J. van der Laan) Pull request description: In `ProcessGetBlockData`, send the block data directly from disk if type MSG_WITNESS_BLOCK is requested. This is a valid shortcut as the on-disk format matches the network format. This is expected to increase performance because a deserialization and subsequent serialization roundtrip is avoided. Tree-SHA512: 9a9500b4c1354eaae1a6f1c6ef2416c1c1985029852589266f3a70e808f6c7482c135e9ab251a527566935378ab7c32dba4ed43ba5451e802d8e72b77d1ba472 Signed-off-by: Pasta <pasta@dashboost.org> # Conflicts: # src/net_processing.cpp # src/validation.cpp

…le (#3408) * Merge bitcoin#13151: net: Serve blocks directly from disk when possible 0bf4318 net: Serve blocks directly from disk when possible (Wladimir J. van der Laan) Pull request description: In `ProcessGetBlockData`, send the block data directly from disk if type MSG_WITNESS_BLOCK is requested. This is a valid shortcut as the on-disk format matches the network format. This is expected to increase performance because a deserialization and subsequent serialization roundtrip is avoided. Tree-SHA512: 9a9500b4c1354eaae1a6f1c6ef2416c1c1985029852589266f3a70e808f6c7482c135e9ab251a527566935378ab7c32dba4ed43ba5451e802d8e72b77d1ba472 Signed-off-by: Pasta <pasta@dashboost.org> # Conflicts: # src/net_processing.cpp # src/validation.cpp * fix extra bracket Signed-off-by: Pasta <pasta@dashboost.org> * remove read raw block from disk functions Signed-off-by: Pasta <pasta@dashboost.org> * correct `if (pblock)` scope Signed-off-by: Pasta <pasta@dashboost.org> Co-authored-by: Wladimir J. van der Laan <laanwj@gmail.com>

laanwj mentioned this pull request May 2, 2018

Skip tx-rehashing on historic blocks #13098

Closed

fanquake added P2P Validation labels May 2, 2018

laanwj force-pushed the 2018_05_direct_from_disk branch from c032b99 to 4c790df Compare May 2, 2018 12:33

promag reviewed May 2, 2018

View reviewed changes

maflcko reviewed May 2, 2018

View reviewed changes

src/net_processing.cpp Show resolved Hide resolved

TheBlueMatt reviewed May 2, 2018

View reviewed changes

maflcko reviewed May 2, 2018

View reviewed changes

laanwj commented May 3, 2018

View reviewed changes

laanwj changed the title ~~WIP: net: Serve blocks directly from disk when possible~~ net: Serve blocks directly from disk when possible May 3, 2018

sipa reviewed May 3, 2018

View reviewed changes

promag reviewed May 7, 2018

View reviewed changes

laanwj force-pushed the 2018_05_direct_from_disk branch from 9893e71 to 0bf4318 Compare May 15, 2018 06:13

laanwj merged commit 0bf4318 into bitcoin:master May 23, 2018

Bushstar mentioned this pull request May 26, 2018

commits from bitcoin/master FeatherCoin/Feathercoin#336

Merged

maflcko mentioned this pull request Nov 20, 2019

rpc: Faster getblock using PureBlock #17529

Closed

bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021

net: Serve blocks directly from disk when possible #13151

net: Serve blocks directly from disk when possible #13151

Uh oh!

Conversation

laanwj commented May 2, 2018

Uh oh!

maflcko commented May 2, 2018

Uh oh!

laanwj commented May 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

promag left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

laanwj May 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

maflcko commented May 2, 2018

Uh oh!

laanwj commented May 2, 2018

Uh oh!

laanwj commented May 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maflcko commented May 2, 2018

Uh oh!

gmaxwell commented May 2, 2018

Uh oh!

TheBlueMatt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

laanwj May 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

laanwj May 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maflcko left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jonasschnelli commented May 3, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

laanwj commented May 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sipa left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maflcko commented May 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

laanwj commented May 4, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jonasschnelli commented May 7, 2018

Master (-g -O2):

This PR (-g O2):

laanwj commented May 2, 2018 •

edited

Loading

laanwj May 2, 2018 •

edited

Loading

laanwj commented May 2, 2018 •

edited

Loading

laanwj May 3, 2018 •

edited

Loading

laanwj May 3, 2018 •

edited

Loading

maflcko left a comment •

edited

Loading

laanwj commented May 3, 2018 •

edited

Loading

maflcko commented May 4, 2018 •

edited

Loading

Master (`-g -O2`):

This PR (`-g O2`):

jonasschnelli commented May 9, 2018 •

edited

Loading

laanwj commented May 13, 2018 •

edited

Loading

laanwj commented May 15, 2018 •

edited

Loading