Skip to content

Conversation

AngusP
Copy link
Contributor

@AngusP AngusP commented Jun 6, 2024

This test uses the extra_txn (vExtraTxnForCompact) vector of optional orphan/conflicted/etc. transactions to provide transactions to a PartiallyDownloadedBlock that are not otherwise present in the mempool, and check that they are used.

This also covers a former nullptr deref bug that was fixed in #29752 (bf031a5) where the extra_txn vec/circular-buffer was null-initialized and not yet filled when dereferenced in PartiallyDownloadedBlock::InitData.

@DrahtBot
Copy link
Contributor

DrahtBot commented Jun 6, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK marcofleon, dergoegge, glozow
Concept ACK instagibbs

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #29625 (Several randomness improvements by sipa)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@DrahtBot DrahtBot added the Tests label Jun 6, 2024
This new test uses the `vExtraTxnForCompact` (`extra_txn`) vector of
optional orphan/conflicted/etc. transactions to provide a transaction
in a compact block that was not otherwise present in our mempool.

This also covers an improbable nullptr deref bug addressed in
bf031a5 (bitcoin#29752) where the
`extra_txn` vec/circular-buffer was sometimes null-initialized and
not yet filled when dereferenced in `PartiallyDownloadedBlock::InitData`.
@AngusP AngusP force-pushed the 2024-06-cmpctblk-extra-txn-test branch from b9e1704 to 4c99301 Compare June 6, 2024 12:08
@DrahtBot
Copy link
Contributor

DrahtBot commented Jun 6, 2024

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/25888431395

@DrahtBot DrahtBot removed the CI failed label Jun 6, 2024
Copy link
Member

@instagibbs instagibbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concept ACK!

// Add an unrelated tx to extra_txn:
extra_txn[0] = non_block_tx;
// and a tx from the block that's not in the mempool:
extra_txn[1] = block.vtx[1];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think another useful case is a transaction that's also in your mempool

Copy link
Contributor Author

@AngusP AngusP Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L325 has pool.addUnchecked(entry.FromTx(block.vtx[2])); which adds a tx from the block to the mempool, giving .IsTxAvailable(2) (.IsTxAvailable(0) should always be true as that's the coinbase which is in the compact block header, IIUC).

SimpleRoundTripTest and a couple others in here also already covered the add to mempool case.

Copy link
Member

@dergoegge dergoegge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 4c99301

I've verified that this can reproduce the nullptr deref bug by reverting a8203e9 and making short id collisions more likely (see top commits on: https://github.com/dergoegge/bitcoin/commits/2024-06-cmpctblk-extra-txn-test/).

$ n=0; while (( n++ < 100 )); do src/test/test_bitcoin --run_test=blockencodings_tests/ReceiveWithExtraTransactions ; done
...
Running 1 test case...
unknown location(0): fatal error: in "blockencodings_tests/ReceiveWithExtraTransactions": memory access violation at address: 0x00000059: no mapping at fault address
../../src/test/blockencodings_tests.cpp(333): last checkpoint

*** 1 failure is detected in the test module "Bitcoin Core Test Suite"
...

While it might be extremely rare, I believe this test is flaky due to possible short id collisions. Flakes can be observed by cherry picking just dergoegge@540fb4e (i.e. increasing the likelyhood of collisions):

$ n=0; while (( n++ < 100 )); do src/test/test_bitcoin --run_test=blockencodings_tests/ReceiveWithExtraTransactions ; done
...
Running 1 test case...
../../src/test/blockencodings_tests.cpp(347): error: in "blockencodings_tests/ReceiveWithExtraTransactions": check partial_block_with_extra.IsTxAvailable(2) has failed

*** 1 failure is detected in the test module "Bitcoin Core Test Suite"
...
Running 1 test case...
../../src/test/blockencodings_tests.cpp(333): error: in "blockencodings_tests/ReceiveWithExtraTransactions": check partial_block.InitData(cmpctblock, extra_txn) == READ_STATUS_OK has failed
../../src/test/blockencodings_tests.cpp(336): error: in "blockencodings_tests/ReceiveWithExtraTransactions": check partial_block.IsTxAvailable(2) has failed
../../src/test/blockencodings_tests.cpp(343): error: in "blockencodings_tests/ReceiveWithExtraTransactions": check partial_block_with_extra.InitData(cmpctblock, extra_txn) == READ_STATUS_OK has failed
../../src/test/blockencodings_tests.cpp(346): error: in "blockencodings_tests/ReceiveWithExtraTransactions": check partial_block_with_extra.IsTxAvailable(1) has failed
../../src/test/blockencodings_tests.cpp(347): error: in "blockencodings_tests/ReceiveWithExtraTransactions": check partial_block_with_extra.IsTxAvailable(2) has failed

*** 5 failures are detected in the test module "Bitcoin Core Test Suite"

Not sure if that is worth addressing since it's so rare but perhaps simply adding a comment that the test might fail every 2^48 runs is good?

@DrahtBot DrahtBot requested a review from instagibbs June 6, 2024 15:18
@AngusP
Copy link
Contributor Author

AngusP commented Jun 7, 2024

Not sure if that is worth addressing since it's so rare but perhaps simply adding a comment that the test might fail every 2^48 runs is good?

I think it's worth avoiding, I don't really like the notion of flaky tests maybe failing for a random reason, even if it is very improbable. I'd guess short-ID collisions would also affect some of the other existing tests in here so there's additional benefit in them not being flaky too.

@glozow
Copy link
Member

glozow commented Jun 7, 2024

Concept ACK

Not sure if that is worth addressing since it's so rare but perhaps simply adding a comment that the test might fail every 2^48 runs is good?

I think it's worth avoiding, I don't really like the notion of flaky tests maybe failing for a random reason, even if it is very improbable. I'd guess short-ID collisions would also affect some of the other existing tests in here so there's additional benefit in them not being flaky too.

Not sure exactly how flaky it is and don't want to blow this out of proportion. Comment sounds good to save a future debugger from headache. If we might hit it, you can knock it down closer to 0 by running it e.g. 5 times and asserting that it doesn't fail every time, like in coinselector_tests:

// Again, we only create the wallet once to save time, but we still run the coin selection RUN_TESTS times.
for (int i = 0; i < RUN_TESTS; i++) {
// picking 50 from 100 coins doesn't depend on the shuffle,
// but does depend on randomness in the stochastic approximation code
const auto result25 = KnapsackSolver(GroupCoins(available_coins.All()), 50 * COIN, CENT);
BOOST_CHECK(result25);
const auto result26 = KnapsackSolver(GroupCoins(available_coins.All()), 50 * COIN, CENT);
BOOST_CHECK(result26);
BOOST_CHECK(!EqualResult(*result25, *result26));
int fails = 0;
for (int j = 0; j < RANDOM_REPEATS; j++)
{
// Test that the KnapsackSolver selects randomly from equivalent coins (same value and same input size).
// When choosing 1 from 100 identical coins, 1% of the time, this test will choose the same coin twice
// which will cause it to fail.
// To avoid that issue, run the test RANDOM_REPEATS times and only complain if all of them fail
const auto result27 = KnapsackSolver(GroupCoins(available_coins.All()), COIN, CENT);
BOOST_CHECK(result27);
const auto result28 = KnapsackSolver(GroupCoins(available_coins.All()), COIN, CENT);
BOOST_CHECK(result28);
if (EqualResult(*result27, *result28))
fails++;
}
BOOST_CHECK_NE(fails, RANDOM_REPEATS);
}

@maflcko
Copy link
Member

maflcko commented Jun 7, 2024

Would it be difficult to make the test deterministic?

@AngusP
Copy link
Contributor Author

AngusP commented Jun 12, 2024

Would it be difficult to make the test deterministic?

Yes, the straightforward way would be to replace the InsecureRand256() with fixed hashes that don't result in transaction short IDs that collide. Less-straightforward but maybe better would be to keep using random hashes but check for a short ID collision and try again with a different random hash until the collision is gone -- or start with one random hash and change it in a way that guarantees the short ID (siphash) will be different, but I'm not sure if siphash is easily 'attackable' in that sense.

@maflcko
Copy link
Member

maflcko commented Jun 12, 2024

InsecureRand256

I'd say longer them those global non-deterministic functions should go away anyway. Not something you need to do here, but if you want you can create a commit to accept a FastRandomContext reference as param to BuildBlockTestCase, then in each unit test, create a local FastRandomContext and bind it to a lambda named BuildBlockTestCase.

@AngusP
Copy link
Contributor Author

AngusP commented Jun 12, 2024

Pushed 09b90c6 which makes the tests deterministic by using predictable 'random' numbers. I've jsut picked them, there's nothing particularly special about their values.

Using this to reproduce the issue @dergoegge pointed out:

diff --git a/src/blockencodings.cpp b/src/blockencodings.cpp
index 695e8d806a..64d635a97a 100644
--- a/src/blockencodings.cpp
+++ b/src/blockencodings.cpp
@@ -44,7 +44,8 @@ void CBlockHeaderAndShortTxIDs::FillShortTxIDSelector() const {

 uint64_t CBlockHeaderAndShortTxIDs::GetShortID(const Wtxid& wtxid) const {
     static_assert(SHORTTXIDS_LENGTH == 6, "shorttxids calculation assumes 6-byte shorttxids");
-    return SipHashUint256(shorttxidk0, shorttxidk1, wtxid) & 0xffffffffffffL;
+    // return SipHashUint256(shorttxidk0, shorttxidk1, wtxid) & 0xffffffffffffL;
+    return SipHashUint256(shorttxidk0, shorttxidk1, wtxid) & 0x0f;
 }


and then running

set -e;
n=0;
while (( n++ < 5000 )); do
    src/test/test_bitcoin --run_test=blockencodings_tests;
done

the failures were either READ_STATUS_FAILED from InitData, or IsTxAvailable(...) == false when we were expecting true.

On the above-patched 09b90c6, I've not seen any failures with 0x0f short IDs and ~10k re-runs of all the blockencodings tests.

  • Use a new FastRandomContext with a fixed seed in each test, to ensure 'random' uint256s used as fake prevouts are deterministic, so in-turn test txids and short IDs are deterministic and don't collide causing very rare but flaky test failures.
  • Add new test-only/internal initializer for CBlockHeaderAndShortTxIDs that takes a specified nonce to further ensure determinism and avoid rare but undesireable short ID collisions.
    In a test context this nonce is set to a fixed known-good value. Normally it is random, as previously.

@AngusP AngusP requested a review from dergoegge June 12, 2024 21:36
Copy link
Member

@dergoegge dergoegge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tACK 09b90c6

@DrahtBot DrahtBot requested a review from glozow June 17, 2024 10:35
@DrahtBot DrahtBot requested a review from glozow June 17, 2024 11:58
@AngusP AngusP force-pushed the 2024-06-cmpctblk-extra-txn-test branch 2 times, most recently from 8df9d16 to bba77a2 Compare June 18, 2024 22:15
@DrahtBot

This comment was marked as outdated.

@AngusP AngusP requested a review from dergoegge June 19, 2024 12:28
@AngusP AngusP force-pushed the 2024-06-cmpctblk-extra-txn-test branch 2 times, most recently from 34159eb to c24d2bb Compare June 19, 2024 21:50
refactor: CBlockHeaderAndShortTxIDs constructor now always takes an explicit nonce.
test: Make blockencodings_tests deterministic using fixed seed providing deterministic
CBlockHeaderAndShortTxID nonces and dummy transaction IDs.

Fixes very rare flaky test failures, where the ShortIDs of test transactions collide, leading to
`READ_STATUS_FAILED` from PartiallyDownloadedBlock::InitData and/or `IsTxAvailable` giving `false`
when the transaction should actually be available.

 * Use a new `FastRandomContext` with a fixed seed in each test, to ensure 'random' uint256s
   used as fake prevouts are deterministic, so in-turn test txids and short IDs are deterministic
   and don't collide causing very rare but flaky test failures.
 * Add new test-only/internal initializer for `CBlockHeaderAndShortTxIDs` that takes a specified
   nonce to further ensure determinism and avoid rare but undesireable short ID collisions.
   In a test context this nonce is set to a fixed known-good value. Normally it is random, as
   previously.

Flaky test failures can be reproduced with:

```patch
diff --git a/src/blockencodings.cpp b/src/blockencodings.cpp
index 695e8d8..64d635a97a 100644
--- a/src/blockencodings.cpp
+++ b/src/blockencodings.cpp
@@ -44,7 +44,8 @@ void CBlockHeaderAndShortTxIDs::FillShortTxIDSelector() const {

 uint64_t CBlockHeaderAndShortTxIDs::GetShortID(const Wtxid& wtxid) const {
     static_assert(SHORTTXIDS_LENGTH == 6, "shorttxids calculation assumes 6-byte shorttxids");
-    return SipHashUint256(shorttxidk0, shorttxidk1, wtxid) & 0xffffffffffffL;
+    // return SipHashUint256(shorttxidk0, shorttxidk1, wtxid) & 0xffffffffffffL;
+    return SipHashUint256(shorttxidk0, shorttxidk1, wtxid) & 0x0f;
 }

```

to increase the likelihood of a short ID collision; and running

```shell
set -e;
n=0;
while (( n++ < 5000 )); do
    src/test/test_bitcoin --run_test=blockencodings_tests;
done
```
@AngusP AngusP force-pushed the 2024-06-cmpctblk-extra-txn-test branch from c24d2bb to 55eea00 Compare June 19, 2024 21:56
@AngusP AngusP requested review from glozow and dergoegge June 20, 2024 07:45
Copy link
Contributor

@marcofleon marcofleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 55eea00. I ran the blockencodings unit test and no issues with the new test case.

Copy link
Member

@dergoegge dergoegge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 55eea00

Copy link
Member

@glozow glozow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 55eea00

@@ -2549,7 +2549,7 @@ void PeerManagerImpl::ProcessGetBlockData(CNode& pfrom, Peer& peer, const CInv&
if (a_recent_compact_block && a_recent_compact_block->header.GetHash() == pindex->GetBlockHash()) {
MakeAndPushMessage(pfrom, NetMsgType::CMPCTBLOCK, *a_recent_compact_block);
} else {
CBlockHeaderAndShortTxIDs cmpctblock{*pblock};
CBlockHeaderAndShortTxIDs cmpctblock{*pblock, GetRand<uint64_t>()};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another PR could change these calls to use PeerManagerImpl::m_rng though it would require taking off the g_msgproc_mutex guard.

@glozow glozow merged commit 0bd2bd1 into bitcoin:master Jul 1, 2024
@bitcoin bitcoin locked and limited conversation to collaborators Jul 1, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants