Skip to content

Conversation

glozow
Copy link
Member

@glozow glozow commented Jul 5, 2023

See #27463 for full project tracking.
Please see #27742 for how this PR fits into the big picture. This PR is based on a more recent commit than that one.

This branch includes:
(1) (now merged, see #30110) Introduces TxDownloadManager, which handles all transaction downloading. Adds tests for TxDownloadManager.
(2) (now merged, see #31397) Adds an "orphan resolution module". It adds all announcers of an orphan as potential resolution candidates, in a tracker implemented as a TxRequestTracker. In this PR, "orphan resolution" means requesting missing parents by getdata(MSG_TX | MSG_WITNESS_FLAG, missing_txid). In a future PR, we'll add another resolution method, requesting ancestor wtxids using getdata(MSG_ANCPKGINFO, orphan_wtxid).
(3) Makes TxDownloadManager internally thread-safe

@glozow glozow added the P2P label Jul 5, 2023
@DrahtBot
Copy link
Contributor

DrahtBot commented Jul 5, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/28031.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #31666 (multi-peer orphan resolution followups by glozow)
  • #31650 (refactor: Enforce readability-avoid-const-params-in-decls by maflcko)
  • #31397 (p2p: track and use all potential peers for orphan resolution by glozow)
  • #27052 (test: rpc: add last block announcement time to getpeerinfo result by LarryRuane)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@DrahtBot
Copy link
Contributor

DrahtBot commented Jul 6, 2023

Looks like the CI fails:

�[0m�[0;31mp2p_orphan_handling.py                                 | ✖ Failed  | 2407 s

@glozow
Copy link
Member Author

glozow commented Jul 6, 2023

Investigating, thanks @DrahtBot

Copy link
Member

@instagibbs instagibbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some initial comments through 116378e

Log changes suggested are helpful for tracing what's happening in the orphanage on my node I'm testing.

// parents are (it could also be that the orphan has already been resolved).
// Give up.
orphan_request_tracker.ForgetTxHash(gtxid.GetHash());
LogPrint(BCLog::TXPACKAGES, "\nForgetting orphan %s from peer=%d\n", gtxid.GetHash().ToString(), nodeid);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LogPrint(BCLog::TXPACKAGES, "\nForgetting orphan %s from peer=%d\n", gtxid.GetHash().ToString(), nodeid);
LogPrint(BCLog::TXPACKAGES, "\nForgetting orphan request %s from peer=%d\n", gtxid.GetHash().ToString(), nodeid);

@@ -2058,7 +2058,7 @@ def received_wtxidrelay():
test_transaction_acceptance(self.nodes[0], self.wtx_node, tx2, with_witness=True, accepted=False)

# Expect a request for parent (tx) by txid despite use of WTX peer
self.wtx_node.wait_for_getdata([tx.sha256], 60)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we programatically justify this magic :)

AssertLockHeld(::cs_main); // For m_txrequest
const bool connected = m_connman.ForNode(nodeid, [](CNode* node) { return node->fSuccessfullyConnected && !node->fDisconnect; });
if (!connected) return;
if (m_txpackagetracker->Count(nodeid) + m_txrequest.Count(nodeid) >= MAX_PEER_TX_ANNOUNCEMENTS) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I've mentioned this offline as well)

The locking assumptions around this check are weird because m_txpackagetracker has its own internal mutex where as m_txrequest is guarded by cs_main. There is nothing stopping m_txpackagetracker->Count() form returning something different right after this check.

Copy link
Member Author

@glozow glozow Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point.

It seems that, given the need to synchronize between TxRequestTracker and the package tracking stuff, we should have them both guarded by 1 lock. Looking at #26151 (review) it seems like we could have a m_txrequest GUARDED_BY(tx_download_mutex), m_txpackagetracker GUARDED_BY(tx_download_mutex), and lock it from these peerman functions?

Alternatively, I wonder if it makes sense to take it a step further and put this all in a TxDownloadManager module that wraps orphanage, txrequest, and package tracking.

Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some more comments, not finished yet though.

@@ -2966,7 +2966,7 @@ bool PeerManagerImpl::ProcessOrphanTx(Peer& peer)
LogPrint(BCLog::MEMPOOL, " accepted orphan tx %s\n", orphan_wtxid.ToString());
RelayTransaction(orphanHash, porphanTx->GetWitnessHash());
m_txpackagetracker->AddChildrenToWorkSet(*porphanTx);
m_txpackagetracker->EraseOrphanTx(orphanHash);
m_txpackagetracker->EraseOrphanTx(porphanTx->GetWitnessHash());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could use orphan_wtxid introduced in the previous commit, here and in various other places.

Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts about logging:

After testing this on mainnet for a bit, I think it would be nice to be able to follow the fate of each individual orphan in BCLog::TXPACKAGES:
We currently have a tx-level based log entry for addition (“stored orphan..."), but nothing for removal, so having additional log entries in MempoolAcceptedTx() for succesful resolution and MempoolRejectedTx() for failure would be nice, maybe also a tx-level message for removal due to expiration and overflow.

If this became too spammy, we could use different severities, but I think it shouldn't be too bad except if you just started with an empty mempool?

Also, could we log wtxids whenever possible? There is currently a bit of a mix.

@glozow glozow mentioned this pull request Aug 2, 2023
@glozow glozow marked this pull request as draft August 3, 2023 14:07
@glozow glozow force-pushed the orphan-resolution-module branch from 1aee6d3 to 5f07322 Compare November 28, 2024 12:59
@glozow glozow force-pushed the orphan-resolution-module branch 2 times, most recently from 801d06f to 41d8400 Compare November 28, 2024 15:15
This means we no longer return parents we already have in the
m_unique_parents result from MempoolRejectedTx.

We need to separate the loop that checks AlreadyHave parents from the
loop that adds parents as announcments, because we may do the latter
loop multiple times for different peers.
Now that we track all announcers of an orphan, it's not helpful to
consider an orphan provided by a peer that didn't send us this parent.
It can only hurt our chances of finding the right orphan when there are
multiple candidates.

Adapt the 2 tests in p2p_opportunistic_1p1c.py that looked at 1p1c
packages from different peers. Instead of checking that the right peer
is punished, we now check that the package is not submitted. We can't
use the functional test to see that the package was not considered
because the behavior is indistinguishable (except for the logs).
The places where m_tx_download_mutex is held in between calls to
txdownloadman are unnecessary.

For example, m_tx_download_mutex was previously held throughout these steps:
  1. ReceivedTx(), receive should_validate=true
  2. MempoolRejectedTx(tx), receive package_to_validate={parent, child}
  3. MempoolRejectedPackage(parent, child)
  4. MempoolRejectedTx(parent)
  5. MempoolRejectedTx(child)

But this is not necessary for correctness. If, for example,
BlockConnected is called in between any of the steps, txdownloadman's
internal state will remain consistent, even if the relevant
transaction(s) confirmed. See the txdownloadman fuzzers.

This sets up the next commit which moves locking into TxDownloadManagerImpl.
It is named slightly differently to make it easy to check that
m_tx_download_mutex is fully removed in the next commit.
@glozow glozow force-pushed the orphan-resolution-module branch from 41d8400 to 0a85e99 Compare November 28, 2024 16:06
@glozow
Copy link
Member Author

glozow commented Dec 1, 2024

Rebased. I have split the first 12 commits (the majority of this PR) into #31397. The other 5 commits are for making the module internally thread-safe, and can be opened in another PR afterward.

@DrahtBot
Copy link
Contributor

🐙 This pull request conflicts with the target branch and needs rebase.

@DrahtBot
Copy link
Contributor

⌛ There hasn't been much activity lately and the patch still needs rebase. What is the status here?

  • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
  • Is it no longer relevant? ➡️ Please close.
  • Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

@glozow
Copy link
Member Author

glozow commented Apr 25, 2025

Closing because this is pretty completed (yay!). TxDownloadManager was added in #30110, and the multi-announcer orphan handling was added in #31397. We just need the last 2 commits to make txdownloadman internally thread-safe, but I believe we have somebody working on it.

@glozow glozow closed this Apr 25, 2025
@github-project-automation github-project-automation bot moved this from In Progress to Done: Mempool in Package Relay + V3 Project Tracking Apr 25, 2025
DashCoreAutoGuix pushed a commit to DashCoreAutoGuix/dash that referenced this pull request Jul 27, 2025
9eac5a0 [functional test] transaction orphan handling (glozow)
61e77bb [test framework] make it easier to fast-forward setmocktime (glozow)

Pull request description:

  I was doing some mutation testing (through reckless refactoring) locally and found some specific behaviors in orphan handling that weren't picked up by tests. Adding some of these test cases now can maybe help with reviewing refactors like bitcoin#28031.

  - Parent requests aren't sent immediately. A delay is added and the requests are filtered by AlreadyHaveTx before they are sent, which means you can't use fake orphans to probe precise arrival timing of a tx.
  - Parent requests include all that are not AlreadyHaveTx. This means old confirmed parents may be requested.
  - The node does not give up on orphans if the peer responds to a parent request with notfound. This means that if a parent is an old confirmed transaction (in which notfound is expected), the orphan should still be resolved.
  - Rejected parents can cause an orphan to be dropped, but it depends on the reason and only based on txid.
  - Rejected parents can cause an orphan to be rejected too, by both wtxid and txid.
  - Requests for orphan parents should be de-duplicated with "regular" txrequest. If a missing parent has the same hash as an in-flight request, it shouldn't be requested.
  - Multiple orphans with overlapping parents should not cause duplicated parent requests.

ACKs for top commit:
  instagibbs:
    reACK bitcoin@9eac5a0
  dergoegge:
    reACK 9eac5a0
  achow101:
    ACK 9eac5a0
  fjahr:
    Code review ACK 9eac5a0

Tree-SHA512: 85488dc6a3f62cf0c38e7dfe7839c01215b44b172d1755b18164d41d01038f3a749451241e4eba8b857fd344a445740b21d6382c45977234b21460e3f53b1b2a
DashCoreAutoGuix pushed a commit to DashCoreAutoGuix/dash that referenced this pull request Aug 3, 2025
9eac5a0 [functional test] transaction orphan handling (glozow)
61e77bb [test framework] make it easier to fast-forward setmocktime (glozow)

Pull request description:

  I was doing some mutation testing (through reckless refactoring) locally and found some specific behaviors in orphan handling that weren't picked up by tests. Adding some of these test cases now can maybe help with reviewing refactors like bitcoin#28031.

  - Parent requests aren't sent immediately. A delay is added and the requests are filtered by AlreadyHaveTx before they are sent, which means you can't use fake orphans to probe precise arrival timing of a tx.
  - Parent requests include all that are not AlreadyHaveTx. This means old confirmed parents may be requested.
  - The node does not give up on orphans if the peer responds to a parent request with notfound. This means that if a parent is an old confirmed transaction (in which notfound is expected), the orphan should still be resolved.
  - Rejected parents can cause an orphan to be dropped, but it depends on the reason and only based on txid.
  - Rejected parents can cause an orphan to be rejected too, by both wtxid and txid.
  - Requests for orphan parents should be de-duplicated with "regular" txrequest. If a missing parent has the same hash as an in-flight request, it shouldn't be requested.
  - Multiple orphans with overlapping parents should not cause duplicated parent requests.

ACKs for top commit:
  instagibbs:
    reACK bitcoin@9eac5a0
  dergoegge:
    reACK 9eac5a0
  achow101:
    ACK 9eac5a0
  fjahr:
    Code review ACK 9eac5a0

Tree-SHA512: 85488dc6a3f62cf0c38e7dfe7839c01215b44b172d1755b18164d41d01038f3a749451241e4eba8b857fd344a445740b21d6382c45977234b21460e3f53b1b2a
DashCoreAutoGuix pushed a commit to DashCoreAutoGuix/dash that referenced this pull request Aug 3, 2025
…tegory

a3b55c9 [doc] move comment about AlreadyHaveTx DoS score to the right place (glozow)
3b8c178 [log] add more logs related to orphan handling (glozow)
51b3275 [log] add category TXPACKAGES for orphanage and package relay (glozow)
a33dde1 [log] include wtxid in tx {relay,validation,orphanage} logging (glozow)

Pull request description:

  This was taken from bitcoin#28031 (see bitcoin#27463 for project tracking).

  - Log wtxids in addition to txids when possible. This allows us to track the fate of a transaction from inv to mempool accept/reject through logs.
  - Additional orphan-related logging to make testing and debugging easier. Suggested in bitcoin#28031 (review) and bitcoin#28031 (comment)
  - Add `TXPACKAGES` category for logging.
  - Move a nearby comment block that was in the wrong place.

ACKs for top commit:
  instagibbs:
    reACK bitcoin@a3b55c9
  achow101:
    ACK a3b55c9
  brunoerg:
    crACK a3b55c9
  mzumsande:
    Code review ACK a3b55c9

Tree-SHA512: 21884ef7c2ea2fd006e715574a9dd3e6cbbe8f82d62c6187fe1d39aad5a834051203fda5f355a06ca40c3e2b9561aec50d7c922a662b1edc96f7b552c9f4b24d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

7 participants