Skip to content

Conversation

laanwj
Copy link
Member

@laanwj laanwj commented May 2, 2018

In ProcessGetBlockData, send the block data directly from disk if type MSG_WITNESS_BLOCK is requested. This is a valid shortcut as the on-disk format matches the network format.

This is expected to increase performance because a deserialization and subsequent serialization roundtrip is avoided.

@maflcko
Copy link
Member

maflcko commented May 2, 2018

Could you please add a benchmark to ./src/bench/checkblock.cpp, so it is easier to see how much this improves?

@laanwj
Copy link
Member Author

laanwj commented May 2, 2018

Sure, though I'm not sure how to do that; none of the benches actually uses ReadBlockFromDisk, I would have to set up a fake block index or such.

(I also don't think it will work on block413567 as-is because it has no magic/size header, and is not a file on disk, though it's easy enough to write a temporary file of course). [When doing this from memory there's effectively nothing to benchmark, too.]

Copy link
Contributor

@promag promag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK.

Couldn't we serve corrupted blocks?


try {
block.resize(nSize);
filein.read((char*)block.data(), nSize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to throw out the idea, mmap wouldn't pay off right?

Copy link
Member Author

@laanwj laanwj May 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's a win here, as the entire block is read consecutively - could even be slower as it'd have to create and destroy the mapping. Also it's not portable.

@maflcko
Copy link
Member

maflcko commented May 2, 2018

Concept ACK. Would be nice to see how much the additional savings are on top of #13098.

@laanwj
Copy link
Member Author

laanwj commented May 2, 2018

Concept ACK. Would be nice to see how much the additional savings are on top of #13098.

At least it's a lot simpler.

@laanwj
Copy link
Member Author

laanwj commented May 2, 2018

Couldn't we serve corrupted blocks?

Yes, that's a possibility, though only if the underlying storage is corrupted. I've posited the idea to add a CRC32C to the on-disk blocks at some point (which is quick to verify, especially with specialized instructions, and should protect against accidental corruptions), but that's quite an invasive change. It's something that could be done later.

The only option to verify with the current information would be to do a Merkle tree check - which could be done without deserialization but it's not pretty... (and a serious overhead SHA256-hashing)

@maflcko
Copy link
Member

maflcko commented May 2, 2018

The only option to verify with the current information would be to do a Merkle tree check

I don't see that we currently do this, so it wouldn't make anything worse here.

@gmaxwell
Copy link
Contributor

gmaxwell commented May 2, 2018

I think serving a corrupted block if our state is corrupted is fine, the peer will just disconnect us and go get the block from someone else, seems pretty harmless!

This is a much smaller change than I was expecting-- in particular I forgot there was a size, light review ACK.

Copy link
Contributor

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK except for the below:

}

try {
block.resize(nSize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably want to check the size is sane before we do this.

Copy link
Member Author

@laanwj laanwj May 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. What constant would be appropriate here?
Edit: I'll go with MAX_SIZE from serialize.h.

Copy link
Member Author

@laanwj laanwj May 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing I wondered here: what is the C++11 proper way to allocate a vector (or a RAII memory area) without zeroing it? I think that's unnecessary here.
That was a bad idea: even though we handle errors while reading, as this data is sent directly over P2P, zeroing is defense-in-depth against heartbleed-style issues here

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will measure some round trips tomorrow.

} else if (inv.type == MSG_WITNESS_BLOCK) {
// Fast-path: in this case it is possible to serve the block directly from disk,
// as the network format matches the format on disk
LogPrintf("debug: Serving raw block directly from disk: %s\n", pindex->ToString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably remove this debug logging

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definitely. I added it while WIP so that people testing this can be sure that the code actually triggers and they're testing the right thing.

@jonasschnelli
Copy link
Contributor

utACK 4c790dff7481d1464a906ad6b17a3179a7da3431

This would probably also speedup an external indexing daemon via p2p (see experiment in https://github.com/jonasschnelli/bitcoincore-indexd [very WIP])

Here a flamegraph of serving the first 200k blocks via p2p localhost (though the real deser/ser starts probably at higher up in the chain).


CMessageHeader::MessageStartChars msg_start_in;
unsigned int nSize;
filein >> msg_start_in >> nSize;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this deserialization logic should be within the try {}.

@laanwj laanwj changed the title WIP: net: Serve blocks directly from disk when possible net: Serve blocks directly from disk when possible May 3, 2018
@laanwj
Copy link
Member Author

laanwj commented May 3, 2018

OK, thanks for review everyone, removed WIP tag and pushed commits with the following changes:

  • Remove extraneous debug log message
  • Check nSize against MAX_SIZE
  • Move deserialization of msg_start_in, and size into exception try
  • Improved variable and parameter naming

Will squash if no further issues.

This would probably also speedup an external indexing daemon via p2p (see experiment in https://github.com/jonasschnelli/bitcoincore-indexd [very WIP])

Yeah - I saw in #13098 that @MarcoFalke had also optimized the zmq full-block notification part. I'm not sure that is any critical path performance-wise (maybe for your indexer?) but if so I'll leave that for a later PR.

Here a flamegraph of serving the first 200k blocks via p2p localhost (though the real deser/ser starts probably at higher up in the chain).

Did you forget to post the link?

Copy link
Member

@sipa sipa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK after squash.

// as the network format matches the format on disk
std::vector<uint8_t> block_data;
if (!ReadRawBlockFromDisk(block_data, pindex, chainparams.MessageStart()))
assert(!"cannot load block from disk");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: braces around then-branch if on a separate line.

@maflcko
Copy link
Member

maflcko commented May 4, 2018

Running with e0223ebf0c58f7beedea91df48e9586154cd4436 and just looking at the wall clock time for reading+optional deserialization shows for me on an ssd:

ssd

Edit: Note that this was done with full fake blocks and not real blocks from the network.

@laanwj
Copy link
Member Author

laanwj commented May 4, 2018

Thanks for benchmarking @MarcoFalke.

I used a patched version of @jonasschnelli's bitcoincore-indexd to benchmark the time for fetching block 0..473600 through P2P, with no processing client-side. The result is:

With patch:
real    63m51.273s

Without patch:
real    70m28.956s

10% speedup. And in my case is the blocks are on a slow harddisk. I expect the gains to be more significant in case of a faster storage medium, or slower CPU.


block.resize(blk_size); // Zeroing of memory is intentional here
filein.read((char*)block.data(), blk_size);
} catch(const std::exception& e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, space after catch } catch (....

blk_size, MAX_SIZE);
}

block.resize(blk_size); // Zeroing of memory is intentional here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zeroing of memory is intentional

Why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid heartbleed-type leaks as this data goes directly over the network.

@jonasschnelli
Copy link
Contributor

Did 10 rounds of requesting blocks in range 490'000 up to 500'000 on master and got.
Setup:

  • Non VM machine
  • SSD 1400MB/s
  • Intel(R) Xeon(R) CPU E3-1275 v5 @ 3.60GHz
  • txindex was enabled
  • connect=0 --whitebind=127.0.0.1:8333
  • no other resource intense applications where running on that system
  • used a modified version of https://github.com/laanwj/bitcoincore-indexd

Master (-g -O2):

95211ms in avg. (all rounds where very similar and there was no need to exclude the first round)
exemption

This PR (-g O2):

101051ms in avg.

I can't figure out why this PR performs ~6% slower and I double checked the comparison by manually rolling back from the head of this PR to 598db38 and back to head. Also added the LogPrintfs back to ensure I'm using the "fast track" (removed again during benchmarking).

sidenode: found out that -debug=net (which was disabled during the rounds reported above) account for a 3% slowdown for the above test-scenario.

@laanwj
Copy link
Member Author

laanwj commented May 9, 2018

@jonasschnelli That's really strange. As reported, I did see some actual speed-ups where using this.
Maybe someone else can try some measurements, or we should just close, I don't know.

@jonasschnelli
Copy link
Contributor

jonasschnelli commented May 9, 2018

If someone wants to compare master against this PR built in the same environment:

PR: https://bitcoin.jonasschnelli.ch/build/600
master: https://bitcoin.jonasschnelli.ch/build/599

@maflcko
Copy link
Member

maflcko commented May 9, 2018

I updated my benchmark to also include the time it takes to Make (serialize) the net message:

net message serialization times

@maflcko
Copy link
Member

maflcko commented May 9, 2018

I think we should definitively look into why it is slower to sync, since that indicates a problem (potentially in our code) exists elsewhere.

@laanwj
Copy link
Member Author

laanwj commented May 13, 2018

I did the same experiment as @jonasschnelli, a modified bitcoincore-indexd that requests block 490000..500000 (https://github.com/laanwj/bitcoincore-indexd/tree/bench). Tried both cases 5 times;

with patch:
real    0m55.928s
real    0m55.986s
real    0m55.913s
real    0m55.844s
real    0m55.790s

without patch (using the commit before):
real    2m47.673s
real    2m46.329s
real    2m46.634s
real    2m46.413s
real    2m46.458s

~66% speedup. This is with a spinning rust disk not a SSD. This was done on a different computer than my previous test.

@jonasschnelli
Copy link
Contributor

I think this is a clear benefit for spinning disk and probably also for ssd in non absurd localhost cases.

utACK

@maflcko
Copy link
Member

maflcko commented May 14, 2018

@jonasschnelli Did you have a chance to look into why your result was unexpected?

@jonasschnelli
Copy link
Contributor

@MarcoFalke: no. I haven't but I'm willing to do as soon as someone could confirm my results (SSD test).

@maflcko
Copy link
Member

maflcko commented May 14, 2018 via email

@maflcko
Copy link
Member

maflcko commented May 14, 2018

For clarity: 598db means master@598db and 9893e means this pull request. I used @laanwj's branch of bitcoincore-indexd.

10k block fetch times 490k-500k

@maflcko
Copy link
Member

maflcko commented May 14, 2018

@jonasschnelli I coulnd't find the branch you were using. Mind to share, otherwise I can't reproduce?

@laanwj
Copy link
Member Author

laanwj commented May 15, 2018

Same experiment as #13151 (comment) on i.MX6Q ARM board w/ USB2 spinning disk:

with patch:
real    10m18.368s
real    11m14.600s
real    10m12.006s
real    10m21.668s
real    10m11.070s

without patch (using the commit before):
real    27m30.574s
real    26m27.591s
real    25m38.311s
real    25m36.661s
real    25m40.902s

Seems to help even with a slow CPU and slow I/O.

In `ProcessGetBlockData`, send the block data directly from disk if
type MSG_WITNESS_BLOCK is requested. This is a valid shortcut as the
on-disk format matches the network format.

This is expected to increase performance because a deserialization and
subsequent serialization roundtrip is avoided.
@laanwj laanwj force-pushed the 2018_05_direct_from_disk branch from 9893e71 to 0bf4318 Compare May 15, 2018 06:13
@laanwj
Copy link
Member Author

laanwj commented May 15, 2018

squashed, no other changes --
9893e712e9e04e8b9478e36e0b5d843899540bd2 → 0bf4318

@jonasschnelli
Copy link
Contributor

I guess my setup was either faulty or there is a performance loss with that particular setup (>1000MB/s IO r&w on very fast CPUs).
However, this PR is a clear and significant win!

@maflcko
Copy link
Member

maflcko commented May 15, 2018

@jonasschnelli I can't explain why, but you might want to try to drop the files cached in memory with sync && sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'. For me this was speeding up the sync on an ssd.

@laanwj laanwj merged commit 0bf4318 into bitcoin:master May 23, 2018
laanwj added a commit that referenced this pull request May 23, 2018
0bf4318 net: Serve blocks directly from disk when possible (Wladimir J. van der Laan)

Pull request description:

  In `ProcessGetBlockData`, send the block data directly from disk if type MSG_WITNESS_BLOCK is requested. This is a valid shortcut as the on-disk format matches the network format.

  This is expected to increase performance because a deserialization and subsequent serialization roundtrip is avoided.

Tree-SHA512: 9a9500b4c1354eaae1a6f1c6ef2416c1c1985029852589266f3a70e808f6c7482c135e9ab251a527566935378ab7c32dba4ed43ba5451e802d8e72b77d1ba472
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Apr 12, 2020
0bf4318 net: Serve blocks directly from disk when possible (Wladimir J. van der Laan)

Pull request description:

  In `ProcessGetBlockData`, send the block data directly from disk if type MSG_WITNESS_BLOCK is requested. This is a valid shortcut as the on-disk format matches the network format.

  This is expected to increase performance because a deserialization and subsequent serialization roundtrip is avoided.

Tree-SHA512: 9a9500b4c1354eaae1a6f1c6ef2416c1c1985029852589266f3a70e808f6c7482c135e9ab251a527566935378ab7c32dba4ed43ba5451e802d8e72b77d1ba472
Signed-off-by: Pasta <pasta@dashboost.org>

# Conflicts:
#	src/net_processing.cpp
#	src/validation.cpp
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Apr 12, 2020
0bf4318 net: Serve blocks directly from disk when possible (Wladimir J. van der Laan)

Pull request description:

  In `ProcessGetBlockData`, send the block data directly from disk if type MSG_WITNESS_BLOCK is requested. This is a valid shortcut as the on-disk format matches the network format.

  This is expected to increase performance because a deserialization and subsequent serialization roundtrip is avoided.

Tree-SHA512: 9a9500b4c1354eaae1a6f1c6ef2416c1c1985029852589266f3a70e808f6c7482c135e9ab251a527566935378ab7c32dba4ed43ba5451e802d8e72b77d1ba472
Signed-off-by: Pasta <pasta@dashboost.org>

# Conflicts:
#	src/net_processing.cpp
#	src/validation.cpp
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Apr 16, 2020
0bf4318 net: Serve blocks directly from disk when possible (Wladimir J. van der Laan)

Pull request description:

  In `ProcessGetBlockData`, send the block data directly from disk if type MSG_WITNESS_BLOCK is requested. This is a valid shortcut as the on-disk format matches the network format.

  This is expected to increase performance because a deserialization and subsequent serialization roundtrip is avoided.

Tree-SHA512: 9a9500b4c1354eaae1a6f1c6ef2416c1c1985029852589266f3a70e808f6c7482c135e9ab251a527566935378ab7c32dba4ed43ba5451e802d8e72b77d1ba472
Signed-off-by: Pasta <pasta@dashboost.org>

# Conflicts:
#	src/net_processing.cpp
#	src/validation.cpp
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Apr 18, 2020
0bf4318 net: Serve blocks directly from disk when possible (Wladimir J. van der Laan)

Pull request description:

  In `ProcessGetBlockData`, send the block data directly from disk if type MSG_WITNESS_BLOCK is requested. This is a valid shortcut as the on-disk format matches the network format.

  This is expected to increase performance because a deserialization and subsequent serialization roundtrip is avoided.

Tree-SHA512: 9a9500b4c1354eaae1a6f1c6ef2416c1c1985029852589266f3a70e808f6c7482c135e9ab251a527566935378ab7c32dba4ed43ba5451e802d8e72b77d1ba472
Signed-off-by: Pasta <pasta@dashboost.org>

# Conflicts:
#	src/net_processing.cpp
#	src/validation.cpp
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Apr 18, 2020
0bf4318 net: Serve blocks directly from disk when possible (Wladimir J. van der Laan)

Pull request description:

  In `ProcessGetBlockData`, send the block data directly from disk if type MSG_WITNESS_BLOCK is requested. This is a valid shortcut as the on-disk format matches the network format.

  This is expected to increase performance because a deserialization and subsequent serialization roundtrip is avoided.

Tree-SHA512: 9a9500b4c1354eaae1a6f1c6ef2416c1c1985029852589266f3a70e808f6c7482c135e9ab251a527566935378ab7c32dba4ed43ba5451e802d8e72b77d1ba472
Signed-off-by: Pasta <pasta@dashboost.org>

# Conflicts:
#	src/net_processing.cpp
#	src/validation.cpp
UdjinM6 pushed a commit to dashpay/dash that referenced this pull request Apr 19, 2020
…le (#3408)

* Merge bitcoin#13151: net: Serve blocks directly from disk when possible

0bf4318 net: Serve blocks directly from disk when possible (Wladimir J. van der Laan)

Pull request description:

  In `ProcessGetBlockData`, send the block data directly from disk if type MSG_WITNESS_BLOCK is requested. This is a valid shortcut as the on-disk format matches the network format.

  This is expected to increase performance because a deserialization and subsequent serialization roundtrip is avoided.

Tree-SHA512: 9a9500b4c1354eaae1a6f1c6ef2416c1c1985029852589266f3a70e808f6c7482c135e9ab251a527566935378ab7c32dba4ed43ba5451e802d8e72b77d1ba472
Signed-off-by: Pasta <pasta@dashboost.org>

# Conflicts:
#	src/net_processing.cpp
#	src/validation.cpp

* fix extra bracket

Signed-off-by: Pasta <pasta@dashboost.org>

* remove read raw block from disk functions

Signed-off-by: Pasta <pasta@dashboost.org>

* correct `if (pblock)` scope

Signed-off-by: Pasta <pasta@dashboost.org>

Co-authored-by: Wladimir J. van der Laan <laanwj@gmail.com>
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants