Skip tx-rehashing on historic blocks #13098

maflcko · 2018-04-26T20:37:27Z

When serving historic blocks, there is no need to calculate the hash of each transaction, so just skip it.

This is mostly refactoring with the following changes:

Introduce a CPureTransaction, a stripped down version of CTransaction (no hash or GetHash() members). Then derive class CTransaction from CPureTransaction
Deduce a CPureBlock which references CPureTransactions in its vtx.

Deserialization now only takes 14% of the wall clock time it took previously on my machine. You can check locally with:

./bench_bitcoin --filter=DeserializeBlockTest && ./bench_bitcoin --filter=DeserializePureBlockTest

Sending a single historic block now takes slightly less (90% of the wall clock time it took previously) on a local network. I presume the speedup is more pronounced when sending more than one block.

maflcko · 2018-04-26T21:38:16Z

The build failure is due to an eight year old bug in clang: https://bugs.llvm.org/show_bug.cgi?id=8263

sdaftuar · 2018-04-27T13:04:53Z

I like the goal here but I find the changes a bit overwhelming for achieving this. It seems like we could achieve this more easily in other ways, like (1) just building the hash on demand the first time it's requested, or (2) adding a deserialization flag to CTransaction that instructs it to skip the hash if we're not planning to use it?

[I'm not sure if (1) is a great solution, since there might be some memory-cache-benefits to building the hash at the time of deserialization, and maybe no one likes the design of having a mutable hash inside of CTransaction, but it would be a super-small diff...]

maflcko · 2018-04-27T14:24:19Z

I didn't particularly like option (1) because it makes the caching run-time depended (it is always compiled in, but only used when it happens to be requested), compared to having it compile time enforced by typing CPureTransaction instead of CTransaction.

Note that my approach is a variation of option (2) by passing through the type of the transaction to ReadBlockFromDisk instead of passing through an additional flag that is converted to a deserialization option and then passed on to the constructor of the transaction.

So I think having explicit types is advantageous in both code clarity and cleanness, with the tradeoff of a minimally larger diff.

Note: I will push a fixed up version of the patch by next Monday to minimize the diff and (hopefully) work around the gcc and clang compiler bugs.

sdaftuar · 2018-04-27T14:40:46Z

Fair enough, I just don't want us to overlook simpler solutions that might be acceptable. I'm happy to let the reviewers with stronger feelings about the design of these data types chime in.

maflcko · 2018-04-27T15:09:52Z

I just pushed a version that might compile on our buggy clang and that is according to the design that @sipa requested in #13011 (comment)

Then at the very least introduce alternate versions of CTransaction/CBlock that don't precompute anything, for use in reindexing and rescanning and serving blocks from disk.

sipa · 2018-04-27T20:57:42Z

@MarcoFalke Thanks for working on this, I'll review soon.

@sdaftuar My preference is having separate types, for these reasons:

Having lazily-computed hashes in transactions involves synchronization structures (atomics at least to avoid multiple threads simultaneously getting a hash for the first time).
Smaller in memory in places where it's not needed.
We have the option of later moving the with-hash types into consensus code and not exposing it.

gmaxwell · 2018-05-01T18:15:06Z

This seems like very useful work, but for serving out historic blocks it would perhaps be better if we could (largely) avoid deserializing and re-serializing them entirely as doing so is tremendously expensive (involving a billionity mallocs). Better is better however.

laanwj · 2018-05-02T11:09:06Z

Having lazily-computed hashes in transactions involves synchronization structures (atomics at least to avoid multiple threads simultaneously getting a hash for the first time).

Agree. The main thing I don't like about lazy evaluation is that it's much harder to reason about performance then. I think I prefer the current explicit solution, even though much more verbose.

This seems like very useful work, but for serving out historic blocks it would perhaps be better if we could (largely) avoid deserializing and re-serializing them entirely as doing so is tremendously expensive (involving a billionity mallocs). Better is better however.

Yes - I was also about to say this. How difficult would it be to serve the data directly from disk, without serialization/deserialization roundtrip?

~~Edit: to do this efficiently would require a change to the on-disk format to add the size, or adding the on-disk block size to the block index.~~ OH this is actually already there, situated before the nPos:

    // Write index header
    unsigned int nSize = GetSerializeSize(fileout, block);
    fileout << messageStart << nSize;

laanwj · 2018-05-02T12:17:59Z

but for serving out historic blocks it would perhaps be better if we could (largely) avoid deserializing and re-serializing them entirely

My dumb try at this: #13151

gmaxwell · 2018-05-02T18:08:30Z

Even with 13151 something like this might very useful for making rescan faster... rescan is slow enough to make it largely unusable now. I'd been thinking deserialization was worse than hashing in it, but the performance numbers above make me wonder.

TheBlueMatt · 2018-05-02T19:57:27Z

Certainly Concept ACK if we can get rescan up for this (even as a separate PR), just using it for serving blocks for old nodes is maybe less worth it (at least post-13151).

maflcko · 2018-05-02T20:00:35Z

Will do a rebase once the other pull is merged

jonasschnelli · 2018-05-03T06:44:48Z

Nice work!
Concept ACK

sipa · 2018-05-23T23:16:40Z

Any reason not to reuse CMutableTransaction for the purposes where CPureTransaction is introduced here? The only reason why CTransaction is immutable is to not interfere with or require synchronization on its precomputed hashes. As CMutableTransaction has no cashed hashes, this is not an issue.

maflcko · 2018-05-24T15:54:42Z

@sipa I'd prefer to make it "type-safe", i.e. make it impossible/compile-time enforced that none of the expensive operations are called. Otherwise this seems too fragile.

I think to make it compile-time enforced, this would require to change one of transaction types to be a class that takes template arguments and then add a static assert in Get*Hash. I am happy to change to that implementation.

sipa · 2018-05-24T16:04:52Z

@MarcoFalke But as long as you keep CMutableTransaction around, such a guarantee doesn't really exist, as that class still has uncached hashing operations (and should probably remain so, see a recent PR that speeds up signing by avoiding CMutableTransaction -> CTransaction conversions).

My preference is just to evolve CTransaction into the validation-logic-optimized-efficient transaction representation, and CMutableTransaction into the simple-generic representation. That may mean that uncached hashes are still invoked in places where performance doesn't matter that much.

sipa · 2018-05-24T16:59:38Z

After discussing this a bit with @MarcoFalke, I see the advantage of being able to enforce through the type system that no GetHash is unexpectedly called on transactions that don't have a cache in certain specific cases (like serving historical blocks).

On the other hand, it's also introducing yet another type for the same data, which seems a high price. As it has const fields, conversion to other transaction types will be expensive if it's ever needed, but it does get the ability to take advantage of more efficient storage if they get implemented (like one malloc per transaction).

Overall, I think my preference is to reuse CMutableTransaction here, but if we want the extra guarantees and an extra type, the current approach seems fine.

ryanofsky · 2018-09-13T20:49:21Z

src/net_processing.cpp

                    } else {
-                        CBlockHeaderAndShortTxIDs cmpctblock(*pblock, fPeerWantsWitness);
-                        connman->PushMessage(pfrom, msgMaker.Make(nSendFlags, NetMsgType::CMPCTBLOCK, cmpctblock));
+                        if (!ReadBlockFromDisk(block_read, pindex, consensusParams)) assert(!"cannot load block from disk");


It seems like when !pblock && inv.type == MSG_FILTERED_BLOCK or !pblock && inv.type == MSG_CMPCT_BLOCK in previous code, the message would be dropped, but now the block gets read from disk? Is this an intentional change if so?

ryanofsky

Conditional utACK 775202d if ProcessGetBlockData behavior is not a problem, or can be left unchanged.

practicalswift · 2018-09-19T07:35:39Z

src/primitives/transaction.h

@@ -283,6 +288,43 @@ class CTransaction
    const int32_t nVersion;
    const uint32_t nLockTime;

+    /** Convert a CMutableTransaction into a CPureTransaction. */
+    CPureTransaction(const CMutableTransaction& tx);


2018-09-18 21:44:48 cppcheck(pr=13098): [src/primitives/transaction.h:292]: (style) Class 'CPureTransaction' has a constructor with 1 argument that is not explicit.

practicalswift · 2018-09-19T07:35:50Z

src/primitives/transaction.h

@@ -283,6 +288,43 @@ class CTransaction
    const int32_t nVersion;
    const uint32_t nLockTime;

+    /** Convert a CMutableTransaction into a CPureTransaction. */
+    CPureTransaction(const CMutableTransaction& tx);
+    CPureTransaction(CMutableTransaction&& tx);


2018-09-18 21:44:48 cppcheck(pr=13098): [src/primitives/transaction.h:293]: (style) Class 'CPureTransaction' has a constructor with 1 argument that is not explicit.

practicalswift · 2018-09-19T07:36:09Z

src/bench/checkblock.cpp

+    while (state.KeepRunning()) {
+        CPureBlock block;
+        stream >> block;
+        assert(stream.Rewind(sizeof(block_bench::block413567)));


2018-09-18 21:44:48 cppcheck(pr=13098): [src/bench/checkblock.cpp:31]: (warning) Assert statement calls a function which may have desired side effects: 'Rewind'.

DrahtBot · 2018-09-20T22:38:38Z

Reviewers, this pull request conflicts with the following ones:

[refactoring] Create common interface for CMutableTransaction and CTransaction with static polymorphism #14337 ([refactoring] Create common interface for CMutableTransaction and CTransaction with static polymorphism by lucash-dev)
Annotate unused parameters with [[maybe_unused]] #14194 (Annotate unused parameters with [[maybe_unused]] by practicalswift)
validation: Document where we are intentionally ignoring bool return values from validation related functions #13864 (validation: Document where we are intentionally ignoring bool return values from validation related functions by practicalswift)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

ryanofsky · 2018-10-19T15:19:58Z

Curious why this was closed.

maflcko · 2019-11-20T04:00:36Z

Rebased (with the p2p changes removed) here: #17529

maflcko added the P2P label Apr 26, 2018

maflcko force-pushed the Mf1804-readPureBlock branch from eccacd5 to 49f3437 Compare April 27, 2018 15:02

maflcko force-pushed the Mf1804-readPureBlock branch 2 times, most recently from fac161a to fa7f987 Compare April 27, 2018 19:18

maflcko mentioned this pull request May 2, 2018

net: Serve blocks directly from disk when possible #13151

Merged

maflcko force-pushed the Mf1804-readPureBlock branch from fa7f987 to e933553 Compare May 4, 2018 00:49

TheBlueMatt mentioned this pull request May 10, 2018

Cache witness hash in CTransaction #13011

Merged

maflcko force-pushed the Mf1804-readPureBlock branch 2 times, most recently from 0caab5a to 1dd43e8 Compare May 23, 2018 19:48

DrahtBot mentioned this pull request Jun 5, 2018

rpc: Add submitheader #13399

Merged

DrahtBot mentioned this pull request Jun 28, 2018

BIP 174 PSBT Serializations and RPCs #13557

Merged

DrahtBot added the Needs rebase label Jul 18, 2018

maflcko force-pushed the Mf1804-readPureBlock branch from 1dd43e8 to 506c092 Compare September 6, 2018 16:54

maflcko force-pushed the Mf1804-readPureBlock branch 3 times, most recently from 0650b71 to f18c222 Compare September 6, 2018 17:02

Skip tx-rehashing on historic blocks

775202d

maflcko force-pushed the Mf1804-readPureBlock branch from f18c222 to 775202d Compare September 6, 2018 17:11

This was referenced Sep 6, 2018

fix various uses of inline #14158

Closed

refactor: Make explicit CMutableTransaction -> CTransaction conversion. #14156

Closed

DrahtBot removed the Needs rebase label Sep 6, 2018

ryanofsky reviewed Sep 13, 2018

View reviewed changes

DrahtBot mentioned this pull request Sep 17, 2018

Remove unnamed block in SendMessages function #14238

Closed

practicalswift reviewed Sep 19, 2018

View reviewed changes

DrahtBot mentioned this pull request Sep 27, 2018

[refactoring] Create common interface for CMutableTransaction and CTransaction with static polymorphism #14337

Closed

maflcko closed this Oct 1, 2018

maflcko deleted the Mf1804-readPureBlock branch October 1, 2018 02:04

maflcko added the Up for grabs label Oct 20, 2018

maflcko removed the Up for grabs label Nov 15, 2019

bitcoin locked as resolved and limited conversation to collaborators Dec 16, 2021

Skip tx-rehashing on historic blocks #13098

Skip tx-rehashing on historic blocks #13098

Uh oh!

Conversation

maflcko commented Apr 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maflcko commented Apr 26, 2018

Uh oh!

sdaftuar commented Apr 27, 2018

Uh oh!

maflcko commented Apr 27, 2018

Uh oh!

sdaftuar commented Apr 27, 2018

Uh oh!

maflcko commented Apr 27, 2018

Uh oh!

sipa commented Apr 27, 2018

Uh oh!

gmaxwell commented May 1, 2018

Uh oh!

laanwj commented May 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

laanwj commented May 2, 2018

Uh oh!

gmaxwell commented May 2, 2018

Uh oh!

TheBlueMatt commented May 2, 2018

Uh oh!

maflcko commented May 2, 2018

Uh oh!

jonasschnelli commented May 3, 2018

Uh oh!

sipa commented May 23, 2018

Uh oh!

maflcko commented May 24, 2018

Uh oh!

sipa commented May 24, 2018

Uh oh!

sipa commented May 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ryanofsky Sep 13, 2018

Choose a reason for hiding this comment

Uh oh!

ryanofsky left a comment

Choose a reason for hiding this comment

Uh oh!

practicalswift Sep 19, 2018

Choose a reason for hiding this comment

Uh oh!

practicalswift Sep 19, 2018

Choose a reason for hiding this comment

Uh oh!

practicalswift Sep 19, 2018

Choose a reason for hiding this comment

Uh oh!

DrahtBot commented Sep 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ryanofsky commented Oct 19, 2018

Uh oh!

maflcko commented Nov 20, 2019

Uh oh!

Uh oh!

maflcko commented Apr 26, 2018 •

edited

Loading

laanwj commented May 2, 2018 •

edited

Loading

sipa commented May 24, 2018 •

edited

Loading

DrahtBot commented Sep 20, 2018 •

edited

Loading