-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Package Relay 1/3: Introduce TxDownloadManager and improve orphan-handling #28031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/28031. ReviewsSee the guideline for information on the review process. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
1d61590
to
116378e
Compare
Looks like the CI fails:
|
Investigating, thanks @DrahtBot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some initial comments through 116378e
Log changes suggested are helpful for tracing what's happening in the orphanage on my node I'm testing.
src/node/txpackagetracker.cpp
Outdated
// parents are (it could also be that the orphan has already been resolved). | ||
// Give up. | ||
orphan_request_tracker.ForgetTxHash(gtxid.GetHash()); | ||
LogPrint(BCLog::TXPACKAGES, "\nForgetting orphan %s from peer=%d\n", gtxid.GetHash().ToString(), nodeid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LogPrint(BCLog::TXPACKAGES, "\nForgetting orphan %s from peer=%d\n", gtxid.GetHash().ToString(), nodeid); | |
LogPrint(BCLog::TXPACKAGES, "\nForgetting orphan request %s from peer=%d\n", gtxid.GetHash().ToString(), nodeid); |
test/functional/p2p_segwit.py
Outdated
@@ -2058,7 +2058,7 @@ def received_wtxidrelay(): | |||
test_transaction_acceptance(self.nodes[0], self.wtx_node, tx2, with_witness=True, accepted=False) | |||
|
|||
# Expect a request for parent (tx) by txid despite use of WTX peer | |||
self.wtx_node.wait_for_getdata([tx.sha256], 60) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we programatically justify this magic :)
src/net_processing.cpp
Outdated
AssertLockHeld(::cs_main); // For m_txrequest | ||
const bool connected = m_connman.ForNode(nodeid, [](CNode* node) { return node->fSuccessfullyConnected && !node->fDisconnect; }); | ||
if (!connected) return; | ||
if (m_txpackagetracker->Count(nodeid) + m_txrequest.Count(nodeid) >= MAX_PEER_TX_ANNOUNCEMENTS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I've mentioned this offline as well)
The locking assumptions around this check are weird because m_txpackagetracker
has its own internal mutex where as m_txrequest
is guarded by cs_main
. There is nothing stopping m_txpackagetracker->Count()
form returning something different right after this check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good point.
It seems that, given the need to synchronize between TxRequestTracker
and the package tracking stuff, we should have them both guarded by 1 lock. Looking at #26151 (review) it seems like we could have a m_txrequest GUARDED_BY(tx_download_mutex)
, m_txpackagetracker GUARDED_BY(tx_download_mutex)
, and lock it from these peerman functions?
Alternatively, I wonder if it makes sense to take it a step further and put this all in a TxDownloadManager
module that wraps orphanage, txrequest, and package tracking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some more comments, not finished yet though.
src/net_processing.cpp
Outdated
@@ -2966,7 +2966,7 @@ bool PeerManagerImpl::ProcessOrphanTx(Peer& peer) | |||
LogPrint(BCLog::MEMPOOL, " accepted orphan tx %s\n", orphan_wtxid.ToString()); | |||
RelayTransaction(orphanHash, porphanTx->GetWitnessHash()); | |||
m_txpackagetracker->AddChildrenToWorkSet(*porphanTx); | |||
m_txpackagetracker->EraseOrphanTx(orphanHash); | |||
m_txpackagetracker->EraseOrphanTx(porphanTx->GetWitnessHash()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could use orphan_wtxid
introduced in the previous commit, here and in various other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thoughts about logging:
After testing this on mainnet for a bit, I think it would be nice to be able to follow the fate of each individual orphan in BCLog::TXPACKAGES
:
We currently have a tx-level based log entry for addition (“stored orphan..."), but nothing for removal, so having additional log entries in MempoolAcceptedTx()
for succesful resolution and MempoolRejectedTx()
for failure would be nice, maybe also a tx-level message for removal due to expiration and overflow.
If this became too spammy, we could use different severities, but I think it shouldn't be too bad except if you just started with an empty mempool?
Also, could we log wtxids whenever possible? There is currently a bit of a mix.
1aee6d3
to
5f07322
Compare
801d06f
to
41d8400
Compare
This means we no longer return parents we already have in the m_unique_parents result from MempoolRejectedTx. We need to separate the loop that checks AlreadyHave parents from the loop that adds parents as announcments, because we may do the latter loop multiple times for different peers.
Now that we track all announcers of an orphan, it's not helpful to consider an orphan provided by a peer that didn't send us this parent. It can only hurt our chances of finding the right orphan when there are multiple candidates. Adapt the 2 tests in p2p_opportunistic_1p1c.py that looked at 1p1c packages from different peers. Instead of checking that the right peer is punished, we now check that the package is not submitted. We can't use the functional test to see that the package was not considered because the behavior is indistinguishable (except for the logs).
The places where m_tx_download_mutex is held in between calls to txdownloadman are unnecessary. For example, m_tx_download_mutex was previously held throughout these steps: 1. ReceivedTx(), receive should_validate=true 2. MempoolRejectedTx(tx), receive package_to_validate={parent, child} 3. MempoolRejectedPackage(parent, child) 4. MempoolRejectedTx(parent) 5. MempoolRejectedTx(child) But this is not necessary for correctness. If, for example, BlockConnected is called in between any of the steps, txdownloadman's internal state will remain consistent, even if the relevant transaction(s) confirmed. See the txdownloadman fuzzers. This sets up the next commit which moves locking into TxDownloadManagerImpl.
It is named slightly differently to make it easy to check that m_tx_download_mutex is fully removed in the next commit.
41d8400
to
0a85e99
Compare
Rebased. I have split the first 12 commits (the majority of this PR) into #31397. The other 5 commits are for making the module internally thread-safe, and can be opened in another PR afterward. |
🐙 This pull request conflicts with the target branch and needs rebase. |
⌛ There hasn't been much activity lately and the patch still needs rebase. What is the status here?
|
9eac5a0 [functional test] transaction orphan handling (glozow) 61e77bb [test framework] make it easier to fast-forward setmocktime (glozow) Pull request description: I was doing some mutation testing (through reckless refactoring) locally and found some specific behaviors in orphan handling that weren't picked up by tests. Adding some of these test cases now can maybe help with reviewing refactors like bitcoin#28031. - Parent requests aren't sent immediately. A delay is added and the requests are filtered by AlreadyHaveTx before they are sent, which means you can't use fake orphans to probe precise arrival timing of a tx. - Parent requests include all that are not AlreadyHaveTx. This means old confirmed parents may be requested. - The node does not give up on orphans if the peer responds to a parent request with notfound. This means that if a parent is an old confirmed transaction (in which notfound is expected), the orphan should still be resolved. - Rejected parents can cause an orphan to be dropped, but it depends on the reason and only based on txid. - Rejected parents can cause an orphan to be rejected too, by both wtxid and txid. - Requests for orphan parents should be de-duplicated with "regular" txrequest. If a missing parent has the same hash as an in-flight request, it shouldn't be requested. - Multiple orphans with overlapping parents should not cause duplicated parent requests. ACKs for top commit: instagibbs: reACK bitcoin@9eac5a0 dergoegge: reACK 9eac5a0 achow101: ACK 9eac5a0 fjahr: Code review ACK 9eac5a0 Tree-SHA512: 85488dc6a3f62cf0c38e7dfe7839c01215b44b172d1755b18164d41d01038f3a749451241e4eba8b857fd344a445740b21d6382c45977234b21460e3f53b1b2a
9eac5a0 [functional test] transaction orphan handling (glozow) 61e77bb [test framework] make it easier to fast-forward setmocktime (glozow) Pull request description: I was doing some mutation testing (through reckless refactoring) locally and found some specific behaviors in orphan handling that weren't picked up by tests. Adding some of these test cases now can maybe help with reviewing refactors like bitcoin#28031. - Parent requests aren't sent immediately. A delay is added and the requests are filtered by AlreadyHaveTx before they are sent, which means you can't use fake orphans to probe precise arrival timing of a tx. - Parent requests include all that are not AlreadyHaveTx. This means old confirmed parents may be requested. - The node does not give up on orphans if the peer responds to a parent request with notfound. This means that if a parent is an old confirmed transaction (in which notfound is expected), the orphan should still be resolved. - Rejected parents can cause an orphan to be dropped, but it depends on the reason and only based on txid. - Rejected parents can cause an orphan to be rejected too, by both wtxid and txid. - Requests for orphan parents should be de-duplicated with "regular" txrequest. If a missing parent has the same hash as an in-flight request, it shouldn't be requested. - Multiple orphans with overlapping parents should not cause duplicated parent requests. ACKs for top commit: instagibbs: reACK bitcoin@9eac5a0 dergoegge: reACK 9eac5a0 achow101: ACK 9eac5a0 fjahr: Code review ACK 9eac5a0 Tree-SHA512: 85488dc6a3f62cf0c38e7dfe7839c01215b44b172d1755b18164d41d01038f3a749451241e4eba8b857fd344a445740b21d6382c45977234b21460e3f53b1b2a
…tegory a3b55c9 [doc] move comment about AlreadyHaveTx DoS score to the right place (glozow) 3b8c178 [log] add more logs related to orphan handling (glozow) 51b3275 [log] add category TXPACKAGES for orphanage and package relay (glozow) a33dde1 [log] include wtxid in tx {relay,validation,orphanage} logging (glozow) Pull request description: This was taken from bitcoin#28031 (see bitcoin#27463 for project tracking). - Log wtxids in addition to txids when possible. This allows us to track the fate of a transaction from inv to mempool accept/reject through logs. - Additional orphan-related logging to make testing and debugging easier. Suggested in bitcoin#28031 (review) and bitcoin#28031 (comment) - Add `TXPACKAGES` category for logging. - Move a nearby comment block that was in the wrong place. ACKs for top commit: instagibbs: reACK bitcoin@a3b55c9 achow101: ACK a3b55c9 brunoerg: crACK a3b55c9 mzumsande: Code review ACK a3b55c9 Tree-SHA512: 21884ef7c2ea2fd006e715574a9dd3e6cbbe8f82d62c6187fe1d39aad5a834051203fda5f355a06ca40c3e2b9561aec50d7c922a662b1edc96f7b552c9f4b24d
See #27463 for full project tracking.
Please see #27742 for how this PR fits into the big picture. This PR is based on a more recent commit than that one.
This branch includes:
(1) (now merged, see #30110) Introduces
TxDownloadManager
, which handles all transaction downloading. Adds tests forTxDownloadManager
.(2) (now merged, see #31397) Adds an "orphan resolution module". It adds all announcers of an orphan as potential resolution candidates, in a tracker implemented as a
TxRequestTracker
. In this PR, "orphan resolution" means requesting missing parents bygetdata(MSG_TX | MSG_WITNESS_FLAG, missing_txid)
. In a future PR, we'll add another resolution method, requesting ancestor wtxids usinggetdata(MSG_ANCPKGINFO, orphan_wtxid)
.(3) Makes
TxDownloadManager
internally thread-safe