Skip to content

Conversation

dergoegge
Copy link
Member

This addresses #28107 (comment) from #28107.

@DrahtBot
Copy link
Contributor

DrahtBot commented Nov 1, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK theStack, stickies-v, TheCharlatan, achow101
Concept ACK glozow

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

@fanquake fanquake requested review from ajtowns and glozow November 1, 2023 11:09
@dergoegge
Copy link
Member Author

Can someone please re-run the "CI / macOS 13 native" job, failure seems unrelated.

@fanquake
Copy link
Member

fanquake commented Nov 1, 2023

failure seems unrelated.

Are you sure?

 Traceback (most recent call last):
  File "/Users/runner/work/bitcoin/bitcoin/ci/scratch/build/bitcoin-x86_64-apple-darwin/test/functional/test_framework/test_framework.py", line 132, in main
    self.run_test()
  File "/Users/runner/work/bitcoin/bitcoin/ci/scratch/build/bitcoin-x86_64-apple-darwin/test/functional/rpc_createmultisig.py", line 65, in run_test
    self.do_multisig()
  File "/Users/runner/work/bitcoin/bitcoin/ci/scratch/build/bitcoin-x86_64-apple-darwin/test/functional/rpc_createmultisig.py", line 241, in do_multisig
    tx = node0.sendrawtransaction(rawtx3["hex"], 0)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/runner/work/bitcoin/bitcoin/ci/scratch/build/bitcoin-x86_64-apple-darwin/test/functional/test_framework/coverage.py", line 50, in __call__
    return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/runner/work/bitcoin/bitcoin/ci/scratch/build/bitcoin-x86_64-apple-darwin/test/functional/test_framework/authproxy.py", line 129, in __call__
    raise JSONRPCException(response['error'], status)
test_framework.authproxy.JSONRPCException: mandatory-script-verify-flag-failed (Witness program was passed an empty witness) (-26)

@dergoegge
Copy link
Member Author

Are you sure?

Yes, it passes locally.

@fanquake
Copy link
Member

fanquake commented Nov 1, 2023

Want to document it as an intermittent failure?

Copy link
Contributor

@stickies-v stickies-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach ACK

I had similar thoughts while reviewing 28107 but figured the computation was still fast enough. Given that ComputeWitnessHash is only called once during initialization (and is private) and HasWitness is called frequently, this approach is strictly better.

@fanquake
Copy link
Member

fanquake commented Nov 1, 2023

The rpc_createmultisig.py test is now failing in two other CI jobs and MSAN is unhappy:

==19201==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x563f069e85d0 in bcmp /msan/llvm-project/compiler-rt/lib/msan/../sanitizer_common/sanitizer_common_interceptors.inc:856:10
    #1 0x563f0956bff5 in base_blob<256u>::Compare(base_blob<256u> const&) const src/./uint256.h:54:66
    #2 0x563f0956bff5 in operator!=(base_blob<256u> const&, base_blob<256u> const&) src/./uint256.h:57:89
    #3 0x563f0956bff5 in CTransaction::HasWitness() const src/./primitives/transaction.h:370:33
    #4 0x563f0956bff5 in void SerializeTransaction<CHashWriter, CTransaction>(CTransaction const&, CHashWriter&) src/./primitives/transaction.h:265:16
    #5 0x563f09569863 in void CTransaction::Serialize<CHashWriter>(CHashWriter&) const src/./primitives/transaction.h:326:9
    #6 0x563f09569863 in void Serialize<CHashWriter, CTransaction>(CHashWriter&, CTransaction const&) src/./serialize.h:776:7
    #7 0x563f09569863 in CHashWriter& CHashWriter::operator<<<CTransaction>(CTransaction const&) src/./hash.h:161:9
    #8 0x563f09569863 in CTransaction::ComputeWitnessHash() const src/primitives/transaction.cpp:93:47
    #9 0x563f09569f47 in CTransaction::CTransaction(CMutableTransaction&&) src/primitives/transaction.cpp:97:190
    #10 0x563f06eb3f68 in std::__1::__shared_ptr_emplace<CTransaction const, std::__1::allocator<CTransaction const>>::__shared_ptr_emplace[abi:v170002]<CMutableTransaction>(std::__1::allocator<CTransaction const>, CMutableTransaction&&) /msan/cxx_build/include/c++/v1/__memory/shared_ptr.h:302:37
    #11 0x563f06eb3f68 in std::__1::shared_ptr<CTransaction const> std::__1::allocate_shared[abi:v170002]<CTransaction const, std::__1::allocator<CTransaction const>, CMutableTransaction, void>(std::__1::allocator<CTransaction const> const&, CMutableTransaction&&) /msan/cxx_build/include/c++/v1/__memory/shared_ptr.h:1022:55
    #12 0x563f09071b26 in std::__1::shared_ptr<CTransaction const> std::__1::make_shared[abi:v170002]<CTransaction const, CMutableTransaction, void>(CMutableTransaction&&) /msan/cxx_build/include/c++/v1/__memory/shared_ptr.h:1031:12
    #13 0x563f09071b26 in std::__1::shared_ptr<CTransaction const> MakeTransactionRef<CMutableTransaction>(CMutableTransaction&&) src/./primitives/transaction.h:418:93
    #14 0x563f09071b26 in ChainstateManager::UpdateUncommittedBlockStructures(CBlock&, CBlockIndex const*) const src/validation.cpp:3706:24
    #15 0x563f09073e46 in ChainstateManager::GenerateCoinbaseCommitment(CBlock&, CBlockIndex const*) const src/validation.cpp:3733:5
    #16 0x563f08bb9dd6 in node::BlockAssembler::CreateNewBlock(CScript const&) src/node/miner.cpp:160:69
    #17 0x563f0822b2a5 in TestChain100Setup::CreateBlock(std::__1::vector<CMutableTransaction, std::__1::allocator<CMutableTransaction>> const&, CScript const&, Chainstate&) src/test/util/setup_common.cpp:310:56

Copy link
Member

@glozow glozow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

@ajtowns
Copy link
Contributor

ajtowns commented Nov 1, 2023

 #3 0x563f0956bff5 in CTransaction::HasWitness() const src/./primitives/transaction.h:370:33
 #4 0x563f0956bff5 in void SerializeTransaction<CHashWriter, CTransaction>(CTransaction const&, CHashWriter&) src/./primitives/transaction.h:265:16

If it's not obvious, the bug is that m_witness_hash is initialised by serializing the transaction with its witness, but in that case, Serialize calls HasWitness() to work out which format to use, which with this change relies on access to m_witness_hash which is not yet initialised.

@dergoegge dergoegge force-pushed the 2023-10-28107-followup branch from fed7b91 to 6643128 Compare November 1, 2023 14:15
@dergoegge
Copy link
Member Author

Added a std::optional<bool> to CTransaction as a cache for HasWitness.

@stickies-v
Copy link
Contributor

stickies-v commented Nov 1, 2023

I'm not a fan of the std::optional<bool> cache approach, it's an error-prone API imo (even if it's private and small). I think it makes more sense to just calculate the boolean at initialization and follow the same pattern as for hash and m_witness_hash? It has to be calculated for every instance anyway.

As e.g. in stickies-v@2e69a65

Edit: using default initializers makes everything a bit more concise: stickies-v@eb316f9 (diff)

@dergoegge dergoegge force-pushed the 2023-10-28107-followup branch from 6643128 to 6b2bd8d Compare November 1, 2023 15:09
@dergoegge
Copy link
Member Author

Thanks @stickies-v went with your suggestion as my previous idea was buggy again🔥

@dergoegge dergoegge force-pushed the 2023-10-28107-followup branch from 6b2bd8d to af1d2ff Compare November 1, 2023 15:15
@@ -310,12 +310,15 @@ class CTransaction

private:
/** Memory only. */
const bool m_has_witness;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really a fan of adding 8 bytes to every CTransaction, though I guess it's not horrible?

Another approach would be to separate out the raw data and compute-logic into one class, and the optimised version with the w/txid cached into a separate class. That looks like: ajtowns@89e3c80

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I don't think the extra bytes matter that much, so I would lean towards leaving this as is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really, we're already wasting 16 bytes due to vin and vout supporting capacity() != size() despite both being const and never needing to be resized.

@theStack
Copy link
Contributor

theStack commented Nov 6, 2023

Concept ACK

Copy link
Contributor

@theStack theStack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks correct.

I'm still unsure on how to assess whether the extra memory needed is okay or not, other than "gut feeling". I would have expected that the increased size of CTransaction (on my system, it's 128 bytes on the PR vs. 120 bytes on master, a growth of ~6.66%) has a tiny impact on the mempool usage accounting as well, but it seems that it doesn't, as RecursiveDynamicUsage only cares about a tx's inputs and outputs fields:

static inline size_t RecursiveDynamicUsage(const CTransaction& tx) {
size_t mem = memusage::DynamicUsage(tx.vin) + memusage::DynamicUsage(tx.vout);
for (std::vector<CTxIn>::const_iterator it = tx.vin.begin(); it != tx.vin.end(); it++) {
mem += RecursiveDynamicUsage(*it);
}
for (std::vector<CTxOut>::const_iterator it = tx.vout.begin(); it != tx.vout.end(); it++) {
mem += RecursiveDynamicUsage(*it);
}
return mem;
}

(A bit off-topic to this PR, but is it really intended to only do dynamic accounting? Just because something doesn't grow dynamically doesn't mean it's completely free, the static parts still have to reside somewhere in memory.)

@sipa
Copy link
Member

sipa commented Nov 12, 2023

@theStack The dynamic memory isn't the only thing that matters, but it's the only thing you don't already account for at another level.

If you have a single transaction, and want its memory usage, use sizeof(transaction) + DynamicMemoryUsage(transaction). It'd be convenient to have the transaction sizeof added to that memory usage. However, things would go wrong if you use a collection then.

Say you have a vector of transactions, and want its memory usage. You'd use sizeof(vector) + dynamic memory of the vector (which consists of the memory allocated by the vector, plus the memory allocated by all of the transactions in it). However, the transactions themselves are in the vector's allocated memory. If they'd be counted alongside their own allocated memory, they'd be counted twice here (and incorrectly, because the overhead of memory allocation only applies once to the entire array, not individually).

@theStack
Copy link
Contributor

theStack commented Nov 12, 2023

@sipa: Thanks for clarifying, that makes sense and I see now why it would be a bad idea to add the static part directly in the DynamicMemoryUsage functions. For the specific case of mempool usage accounting, one could add sizeof(CTransaction) at the call-site though? E.g.

diff --git a/src/kernel/mempool_entry.h b/src/kernel/mempool_entry.h
index 1f175a5ccf..95362fe2d9 100644
--- a/src/kernel/mempool_entry.h
+++ b/src/kernel/mempool_entry.h
@@ -108,7 +108,7 @@ public:
         : tx{tx},
           nFee{fee},
           nTxWeight{GetTransactionWeight(*tx)},
-          nUsageSize{RecursiveDynamicUsage(tx)},
+          nUsageSize{sizeof(*tx) + RecursiveDynamicUsage(tx)},
           nTime{time},
           entry_sequence{entry_sequence},
           entryHeight{entry_height},

(Not saying it's really worth it to fiddle around in this area, and I personally wouldn't feel comfortable to open a PR. I was just very surprised that one could -- in theory -- bloat up CTransaction and it still goes unnoticed for the mempool usage accounting.)

@sipa
Copy link
Member

sipa commented Nov 13, 2023

@theStack (deleted my previous comment, I had missed a lot).

RecursiveDynamicUsage for CTransactionRef (which is a std::shared_ptr<CTransaction>) should already account for both the memory allocated in the shared pointer as well as the memory allocated by the transaction. See src/core_memusage.h:

template<typename X>
static inline size_t RecursiveDynamicUsage(const std::shared_ptr<X>& p) {
    return p ? memusage::DynamicUsage(p) + RecursiveDynamicUsage(*p) : 0;
}

where memusage::DynamicUsage(p) invokes this function from src/memusage.h:

template<typename X>
static inline size_t DynamicUsage(const std::shared_ptr<X>& p)
{
    // A shared_ptr can either use a single continuous memory block for both
    // the counter and the storage (when using std::make_shared), or separate.
    // We can't observe the difference, however, so assume the worst.
    return p ? MallocUsage(sizeof(X)) + MallocUsage(sizeof(stl_shared_counter)) : 0;
}

Is it possible that you just don't see the added field cause changes because either it's fitting in earlier padding space, or because the rounding up effect of malloc() hides the difference?

@theStack
Copy link
Contributor

theStack commented Nov 13, 2023

@sipa: Ah, very interesting, have to admit I completely overlooked RecursiveDynamicUsage(const std::shared_ptr<X>& p).

Is it possible that you just don't see the added field cause changes because either it's fitting in earlier padding space, or because the rounding up effect of malloc() hides the difference?

Yes, it was the latter. The added field did lead to a size increase of CTransaction (from 120 to 128 bytes), but MallocUsage indeed yields the same value for both of these sizes on a 64-bit machine (namely ((120+31)>>4)<<4 = ((128+31)>>4)<<4 = 144). So that's the reason why I did see the same mempool usage accounting results both on master and the PR on my machine (and wrongly concluded from that that the size of CTransaction is not accounted for). To be sure, I intentionally bloated CTransaction up much more with an const uint8_t dummy[100] = {}; entry and verified that the mempool usage was higher indeed. So, the accounting seems to work as expected and my worries were unjustified.

If anyone wants to run these experiments: what I did was to start bitcoind with an existing mempool.dat in datadir both on master and the PR branch with -nolisten -noconnect (to avoid clogging the mempool with new txs), wait until the mempool loaded, ran bitcoin-cli getmempoolinfo and compared the "usage" result field value (other fields like "size", "bytes" and "total_fee" should always be equal for the same mempool.dat).

ACK af1d2ff

(still based on a gut feeling, but garnished with some napkin math: the mempool on my node is close to it's maximum of 300MB and has about ~100k entries now; MallocUsage increases in 16-bytes steps on 64-bit machines, so that would in the worst case lead to ~1,6MB more usage, where now "only" ~99.5k entries would fit; I think that's fine).

@DrahtBot DrahtBot requested a review from stickies-v November 13, 2023 03:02
@DrahtBot DrahtBot requested a review from glozow November 13, 2023 03:02
Copy link
Contributor

@stickies-v stickies-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK af1d2ff

I think the PR and commit description would benefit from describing that this is not a strict improvement, as we slightly increase memory usage (which I think is an acceptable trade-off).

@@ -310,12 +310,15 @@ class CTransaction

private:
/** Memory only. */
const bool m_has_witness;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Using default initializer makes more sense here I think:

git diff on af1d2ff
diff --git a/src/primitives/transaction.cpp b/src/primitives/transaction.cpp
index 77a363f7b6..a20cf25f53 100644
--- a/src/primitives/transaction.cpp
+++ b/src/primitives/transaction.cpp
@@ -93,8 +93,10 @@ Wtxid CTransaction::ComputeWitnessHash() const
     return Wtxid::FromUint256((CHashWriter{0} << *this).GetHash());
 }
 
-CTransaction::CTransaction(const CMutableTransaction& tx) : vin(tx.vin), vout(tx.vout), nVersion(tx.nVersion), nLockTime(tx.nLockTime), m_has_witness{ComputeHasWitness()}, hash{ComputeHash()}, m_witness_hash{ComputeWitnessHash()} {}
-CTransaction::CTransaction(CMutableTransaction&& tx) : vin(std::move(tx.vin)), vout(std::move(tx.vout)), nVersion(tx.nVersion), nLockTime(tx.nLockTime), m_has_witness{ComputeHasWitness()}, hash{ComputeHash()}, m_witness_hash{ComputeWitnessHash()} {}
+CTransaction::CTransaction(const CMutableTransaction& tx)
+    : vin(tx.vin), vout(tx.vout), nVersion(tx.nVersion), nLockTime(tx.nLockTime) {}
+CTransaction::CTransaction(CMutableTransaction&& tx)
+    : vin(std::move(tx.vin)), vout(std::move(tx.vout)), nVersion(tx.nVersion), nLockTime(tx.nLockTime) {}
 
 CAmount CTransaction::GetValueOut() const
 {
diff --git a/src/primitives/transaction.h b/src/primitives/transaction.h
index 594168bbcc..facbd8c769 100644
--- a/src/primitives/transaction.h
+++ b/src/primitives/transaction.h
@@ -310,9 +310,9 @@ public:
 
 private:
     /** Memory only. */
-    const bool m_has_witness;
-    const Txid hash;
-    const Wtxid m_witness_hash;
+    const bool m_has_witness{ComputeHasWitness()};
+    const Txid hash{ComputeHash()};
+    const Wtxid m_witness_hash{ComputeWitnessHash()};
 
     Txid ComputeHash() const;
     Wtxid ComputeWitnessHash() const;

@DrahtBot DrahtBot requested review from stickies-v and removed request for stickies-v November 13, 2023 11:49
@dergoegge
Copy link
Member Author

I think this is rfm. Could a maintainer take a look?

Copy link
Contributor

@TheCharlatan TheCharlatan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK af1d2ff

Nit: Maybe add a HasWitness check somewhere in the transaction_tests?

@mzumsande
Copy link
Contributor

Since it comes with a memory cost, I wonder if the the performance gain we get for it can be measured. Does this speed up any benchmarks / IBD?

@dergoegge
Copy link
Member Author

Since it comes with a memory cost, I wonder if the the performance gain we get for it can be measured. Does this speed up any benchmarks / IBD?

It's less of a gain and more just trying to revert the performance regression I introduced in #28017. I guess you are asking if it is a noticeable regression. I would think both the performance regression and the new memory cost from here are negligible and I can certainly also live with what is in master now.

Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you are asking if it is a noticeable regression. I would think both the performance regression and the new memory cost from here are negligible and I can certainly also live with what is in master now.

I think this should be easily benched just with a --reindex-chainstate? I could try that next week if no one else wants to. I just think that if it's possible within reasonable efforts, this kind of decisions should be made based on empirical data rather than guessing what the performance / memory impacts may be.

@dergoegge
Copy link
Member Author

dergoegge commented Nov 24, 2023

this kind of decisions should be made based on empirical data rather than guessing what the performance / memory impacts may be.

This makes it sound like this is all up in the air when the trade-off for this PR is well known. All this does is revert to the performance we had pre-#28107 while adding an extra 8 bytes to every CTransaction instance.

The runtime complexity of HasWitness is now O(tx.vin.size()) while it used to be O(1). That could be considered good enough reasoning for this PR even if a IBD/reindex benchmark doesn't show a noticeable difference. The 4 concept acks and 3 full acks seem to agree on that.

In any case the memory usage of CTransaction could be optimized in a follow-up, even beyond reverting the overhead from this PR (#28766 (comment)).

Feel free to do the benchmark, but I'm not sure if the result matters much at this point, since we have already spend sufficient review time on this PR.

@dergoegge
Copy link
Member Author

dergoegge commented Nov 24, 2023

All this does is revert to the performance we had pre-#28107 while adding an extra > 8 bytes to every CTransaction instance.

The runtime complexity of HasWitness is now O(tx.vin.size()) while it used to be O(1).

Whoops, this isn't really true. HasWitness was unchanged by #28107, the only change in this regard was cdb14d7.

This PR improves the performance for all call sites of HasWitness not just the ones in cdb14d7.

@achow101
Copy link
Member

ACK af1d2ff

@achow101 achow101 merged commit 26b7bcf into bitcoin:master Nov 28, 2023
@bitcoin bitcoin locked and limited conversation to collaborators Nov 27, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.