-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Improve peformance of CTransaction::HasWitness (28107 follow-up) #28766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. |
Can someone please re-run the "CI / macOS 13 native" job, failure seems unrelated. |
Are you sure? Traceback (most recent call last):
File "/Users/runner/work/bitcoin/bitcoin/ci/scratch/build/bitcoin-x86_64-apple-darwin/test/functional/test_framework/test_framework.py", line 132, in main
self.run_test()
File "/Users/runner/work/bitcoin/bitcoin/ci/scratch/build/bitcoin-x86_64-apple-darwin/test/functional/rpc_createmultisig.py", line 65, in run_test
self.do_multisig()
File "/Users/runner/work/bitcoin/bitcoin/ci/scratch/build/bitcoin-x86_64-apple-darwin/test/functional/rpc_createmultisig.py", line 241, in do_multisig
tx = node0.sendrawtransaction(rawtx3["hex"], 0)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/runner/work/bitcoin/bitcoin/ci/scratch/build/bitcoin-x86_64-apple-darwin/test/functional/test_framework/coverage.py", line 50, in __call__
return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/runner/work/bitcoin/bitcoin/ci/scratch/build/bitcoin-x86_64-apple-darwin/test/functional/test_framework/authproxy.py", line 129, in __call__
raise JSONRPCException(response['error'], status)
test_framework.authproxy.JSONRPCException: mandatory-script-verify-flag-failed (Witness program was passed an empty witness) (-26) |
Yes, it passes locally. |
Want to document it as an intermittent failure? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach ACK
I had similar thoughts while reviewing 28107 but figured the computation was still fast enough. Given that ComputeWitnessHash
is only called once during initialization (and is private) and HasWitness
is called frequently, this approach is strictly better.
The ==19201==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x563f069e85d0 in bcmp /msan/llvm-project/compiler-rt/lib/msan/../sanitizer_common/sanitizer_common_interceptors.inc:856:10
#1 0x563f0956bff5 in base_blob<256u>::Compare(base_blob<256u> const&) const src/./uint256.h:54:66
#2 0x563f0956bff5 in operator!=(base_blob<256u> const&, base_blob<256u> const&) src/./uint256.h:57:89
#3 0x563f0956bff5 in CTransaction::HasWitness() const src/./primitives/transaction.h:370:33
#4 0x563f0956bff5 in void SerializeTransaction<CHashWriter, CTransaction>(CTransaction const&, CHashWriter&) src/./primitives/transaction.h:265:16
#5 0x563f09569863 in void CTransaction::Serialize<CHashWriter>(CHashWriter&) const src/./primitives/transaction.h:326:9
#6 0x563f09569863 in void Serialize<CHashWriter, CTransaction>(CHashWriter&, CTransaction const&) src/./serialize.h:776:7
#7 0x563f09569863 in CHashWriter& CHashWriter::operator<<<CTransaction>(CTransaction const&) src/./hash.h:161:9
#8 0x563f09569863 in CTransaction::ComputeWitnessHash() const src/primitives/transaction.cpp:93:47
#9 0x563f09569f47 in CTransaction::CTransaction(CMutableTransaction&&) src/primitives/transaction.cpp:97:190
#10 0x563f06eb3f68 in std::__1::__shared_ptr_emplace<CTransaction const, std::__1::allocator<CTransaction const>>::__shared_ptr_emplace[abi:v170002]<CMutableTransaction>(std::__1::allocator<CTransaction const>, CMutableTransaction&&) /msan/cxx_build/include/c++/v1/__memory/shared_ptr.h:302:37
#11 0x563f06eb3f68 in std::__1::shared_ptr<CTransaction const> std::__1::allocate_shared[abi:v170002]<CTransaction const, std::__1::allocator<CTransaction const>, CMutableTransaction, void>(std::__1::allocator<CTransaction const> const&, CMutableTransaction&&) /msan/cxx_build/include/c++/v1/__memory/shared_ptr.h:1022:55
#12 0x563f09071b26 in std::__1::shared_ptr<CTransaction const> std::__1::make_shared[abi:v170002]<CTransaction const, CMutableTransaction, void>(CMutableTransaction&&) /msan/cxx_build/include/c++/v1/__memory/shared_ptr.h:1031:12
#13 0x563f09071b26 in std::__1::shared_ptr<CTransaction const> MakeTransactionRef<CMutableTransaction>(CMutableTransaction&&) src/./primitives/transaction.h:418:93
#14 0x563f09071b26 in ChainstateManager::UpdateUncommittedBlockStructures(CBlock&, CBlockIndex const*) const src/validation.cpp:3706:24
#15 0x563f09073e46 in ChainstateManager::GenerateCoinbaseCommitment(CBlock&, CBlockIndex const*) const src/validation.cpp:3733:5
#16 0x563f08bb9dd6 in node::BlockAssembler::CreateNewBlock(CScript const&) src/node/miner.cpp:160:69
#17 0x563f0822b2a5 in TestChain100Setup::CreateBlock(std::__1::vector<CMutableTransaction, std::__1::allocator<CMutableTransaction>> const&, CScript const&, Chainstate&) src/test/util/setup_common.cpp:310:56 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
If it's not obvious, the bug is that |
fed7b91
to
6643128
Compare
Added a |
I'm not a fan of the
Edit: using default initializers makes everything a bit more concise: stickies-v@eb316f9 (diff) |
6643128
to
6b2bd8d
Compare
Thanks @stickies-v went with your suggestion as my previous idea was buggy again🔥 |
6b2bd8d
to
af1d2ff
Compare
@@ -310,12 +310,15 @@ class CTransaction | |||
|
|||
private: | |||
/** Memory only. */ | |||
const bool m_has_witness; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really a fan of adding 8 bytes to every CTransaction
, though I guess it's not horrible?
Another approach would be to separate out the raw data and compute-logic into one class, and the optimised version with the w/txid cached into a separate class. That looks like: ajtowns@89e3c80
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I don't think the extra bytes matter that much, so I would lean towards leaving this as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really, we're already wasting 16 bytes due to vin
and vout
supporting capacity() != size()
despite both being const
and never needing to be resized.
Concept ACK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks correct.
I'm still unsure on how to assess whether the extra memory needed is okay or not, other than "gut feeling". I would have expected that the increased size of CTransaction
(on my system, it's 128 bytes on the PR vs. 120 bytes on master, a growth of ~6.66%) has a tiny impact on the mempool usage accounting as well, but it seems that it doesn't, as RecursiveDynamicUsage
only cares about a tx's inputs and outputs fields:
Lines 32 to 41 in 8243762
static inline size_t RecursiveDynamicUsage(const CTransaction& tx) { | |
size_t mem = memusage::DynamicUsage(tx.vin) + memusage::DynamicUsage(tx.vout); | |
for (std::vector<CTxIn>::const_iterator it = tx.vin.begin(); it != tx.vin.end(); it++) { | |
mem += RecursiveDynamicUsage(*it); | |
} | |
for (std::vector<CTxOut>::const_iterator it = tx.vout.begin(); it != tx.vout.end(); it++) { | |
mem += RecursiveDynamicUsage(*it); | |
} | |
return mem; | |
} |
(A bit off-topic to this PR, but is it really intended to only do dynamic accounting? Just because something doesn't grow dynamically doesn't mean it's completely free, the static parts still have to reside somewhere in memory.)
@theStack The dynamic memory isn't the only thing that matters, but it's the only thing you don't already account for at another level. If you have a single transaction, and want its memory usage, use Say you have a vector of transactions, and want its memory usage. You'd use |
@sipa: Thanks for clarifying, that makes sense and I see now why it would be a bad idea to add the static part directly in the diff --git a/src/kernel/mempool_entry.h b/src/kernel/mempool_entry.h
index 1f175a5ccf..95362fe2d9 100644
--- a/src/kernel/mempool_entry.h
+++ b/src/kernel/mempool_entry.h
@@ -108,7 +108,7 @@ public:
: tx{tx},
nFee{fee},
nTxWeight{GetTransactionWeight(*tx)},
- nUsageSize{RecursiveDynamicUsage(tx)},
+ nUsageSize{sizeof(*tx) + RecursiveDynamicUsage(tx)},
nTime{time},
entry_sequence{entry_sequence},
entryHeight{entry_height}, (Not saying it's really worth it to fiddle around in this area, and I personally wouldn't feel comfortable to open a PR. I was just very surprised that one could -- in theory -- bloat up |
@theStack (deleted my previous comment, I had missed a lot).
template<typename X>
static inline size_t RecursiveDynamicUsage(const std::shared_ptr<X>& p) {
return p ? memusage::DynamicUsage(p) + RecursiveDynamicUsage(*p) : 0;
}
where template<typename X>
static inline size_t DynamicUsage(const std::shared_ptr<X>& p)
{
// A shared_ptr can either use a single continuous memory block for both
// the counter and the storage (when using std::make_shared), or separate.
// We can't observe the difference, however, so assume the worst.
return p ? MallocUsage(sizeof(X)) + MallocUsage(sizeof(stl_shared_counter)) : 0;
} Is it possible that you just don't see the added field cause changes because either it's fitting in earlier padding space, or because the rounding up effect of |
@sipa: Ah, very interesting, have to admit I completely overlooked
Yes, it was the latter. The added field did lead to a size increase of If anyone wants to run these experiments: what I did was to start bitcoind with an existing mempool.dat in datadir both on master and the PR branch with ACK af1d2ff (still based on a gut feeling, but garnished with some napkin math: the mempool on my node is close to it's maximum of 300MB and has about ~100k entries now; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK af1d2ff
I think the PR and commit description would benefit from describing that this is not a strict improvement, as we slightly increase memory usage (which I think is an acceptable trade-off).
@@ -310,12 +310,15 @@ class CTransaction | |||
|
|||
private: | |||
/** Memory only. */ | |||
const bool m_has_witness; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Using default initializer makes more sense here I think:
git diff on af1d2ff
diff --git a/src/primitives/transaction.cpp b/src/primitives/transaction.cpp
index 77a363f7b6..a20cf25f53 100644
--- a/src/primitives/transaction.cpp
+++ b/src/primitives/transaction.cpp
@@ -93,8 +93,10 @@ Wtxid CTransaction::ComputeWitnessHash() const
return Wtxid::FromUint256((CHashWriter{0} << *this).GetHash());
}
-CTransaction::CTransaction(const CMutableTransaction& tx) : vin(tx.vin), vout(tx.vout), nVersion(tx.nVersion), nLockTime(tx.nLockTime), m_has_witness{ComputeHasWitness()}, hash{ComputeHash()}, m_witness_hash{ComputeWitnessHash()} {}
-CTransaction::CTransaction(CMutableTransaction&& tx) : vin(std::move(tx.vin)), vout(std::move(tx.vout)), nVersion(tx.nVersion), nLockTime(tx.nLockTime), m_has_witness{ComputeHasWitness()}, hash{ComputeHash()}, m_witness_hash{ComputeWitnessHash()} {}
+CTransaction::CTransaction(const CMutableTransaction& tx)
+ : vin(tx.vin), vout(tx.vout), nVersion(tx.nVersion), nLockTime(tx.nLockTime) {}
+CTransaction::CTransaction(CMutableTransaction&& tx)
+ : vin(std::move(tx.vin)), vout(std::move(tx.vout)), nVersion(tx.nVersion), nLockTime(tx.nLockTime) {}
CAmount CTransaction::GetValueOut() const
{
diff --git a/src/primitives/transaction.h b/src/primitives/transaction.h
index 594168bbcc..facbd8c769 100644
--- a/src/primitives/transaction.h
+++ b/src/primitives/transaction.h
@@ -310,9 +310,9 @@ public:
private:
/** Memory only. */
- const bool m_has_witness;
- const Txid hash;
- const Wtxid m_witness_hash;
+ const bool m_has_witness{ComputeHasWitness()};
+ const Txid hash{ComputeHash()};
+ const Wtxid m_witness_hash{ComputeWitnessHash()};
Txid ComputeHash() const;
Wtxid ComputeWitnessHash() const;
I think this is rfm. Could a maintainer take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK af1d2ff
Nit: Maybe add a HasWitness
check somewhere in the transaction_tests
?
Since it comes with a memory cost, I wonder if the the performance gain we get for it can be measured. Does this speed up any benchmarks / IBD? |
It's less of a gain and more just trying to revert the performance regression I introduced in #28017. I guess you are asking if it is a noticeable regression. I would think both the performance regression and the new memory cost from here are negligible and I can certainly also live with what is in master now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you are asking if it is a noticeable regression. I would think both the performance regression and the new memory cost from here are negligible and I can certainly also live with what is in master now.
I think this should be easily benched just with a --reindex-chainstate
? I could try that next week if no one else wants to. I just think that if it's possible within reasonable efforts, this kind of decisions should be made based on empirical data rather than guessing what the performance / memory impacts may be.
This makes it sound like this is all up in the air when the trade-off for this PR is well known. All this does is revert to the performance we had pre-#28107 while adding an extra 8 bytes to every CTransaction instance. The runtime complexity of In any case the memory usage of CTransaction could be optimized in a follow-up, even beyond reverting the overhead from this PR (#28766 (comment)). Feel free to do the benchmark, but I'm not sure if the result matters much at this point, since we have already spend sufficient review time on this PR. |
Whoops, this isn't really true. HasWitness was unchanged by #28107, the only change in this regard was cdb14d7. This PR improves the performance for all call sites of HasWitness not just the ones in cdb14d7. |
ACK af1d2ff |
This addresses #28107 (comment) from #28107.