Skip to content

Conversation

glozow
Copy link
Member

@glozow glozow commented Nov 30, 2024

Part of #27463.

(Transaction) orphan resolution is a process that kicks off when we are missing UTXOs to validate an unconfirmed transaction. We currently request missing parents by txid; BIP 331 also defines a way to explicitly request ancestors.

Currently, when we find that a transaction is an orphan, we only try to resolve it with the peer who provided the tx. If this doesn't work out (e.g. they send a notfound or don't respond), we do not try again. We actually can't, because we've already forgotten who else could resolve this orphan (i.e. all the other peers who announced the transaction).

What is wrong with this? It makes transaction download less reliable, particularly for 1p1c packages which must go through orphan resolution in order to be downloaded.

Can we fix this with BIP 331 / is this "duct tape" before the real solution?
BIP 331 (receiver-initiated ancestor package relay) is also based on the idea that there is an orphan that needs resolution, but it's just a new way of communicating information. It's not inherently more honest; you can request ancestor package information and get a notfound. So ancestor package relay still requires some kind of procedure for retrying when an orphan resolution attempt fails. See the #27742 implementation which builds on this orphan resolution tracker to keep track of what packages to download (it just isn't rebased on this exact branch). The difference when using BIP 331 is that we request ancpkginfo and then pkgtxns instead of the parent txids.

Zooming out, we'd like orphan handling to be:

  • Bandwidth-efficient: don't have too many requests out at once. As already implemented today, transaction requests for orphan parents and regular download both go through the TxRequestTracker so that we don't have duplicate requests out.
  • Not vulnerable to censorship: don't give up too easily, use all candidate peers. See e.g. https://bitcoincore.org/en/2024/07/03/disclose_already_asked_for/
  • Load-balance between peers: don't overload peers; use all peers available. This is also useful for when we introduce per-peer orphan protection, since each peer will have limited slots.

The approach taken in this PR is to think of each peer who announces an orphan as a potential "orphan resolution candidate." These candidates include:

  • the peer who sent us the orphan tx
  • any peers who announced the orphan prior to us downloading it
  • any peers who subsequently announce the orphan after we have started trying to resolve it
    For each orphan resolution candidate, we treat them as having "announced" all of the missing parents to us at the time of receipt of this orphan transaction (or at the time they announced the tx if they do so after we've already started tracking it as an orphan). We add the missing parents as entries to m_txrequest, incorporating the logic of typical txrequest processing, which means we prefer outbounds, try not to have duplicate requests in flight, don't overload peers, etc.

@glozow glozow added the P2P label Nov 30, 2024
@DrahtBot
Copy link
Contributor

DrahtBot commented Nov 30, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31397.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK marcofleon, instagibbs, dergoegge, mzumsande
Concept ACK sipa

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #31628 (test: add coverage for immediate orphanage eviction case by instagibbs)
  • #28031 (Package Relay 1/3: Introduce TxDownloadManager and improve orphan-handling by glozow)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

Copy link
Member

@dergoegge dergoegge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

I implemented what I was talking about in the review club yesterday here: https://github.com/dergoegge/bitcoin/commits/2024-12-rslv-orphns/ (the only difference is the main commit 3928b40) and it looks like it passes your functional test additions! So perhaps it's worth exploring if we actually need another tracker...

IRC logs
17:34 	<dergoegge> I was wondering why we need another txrequest tracker instance for this? couldn’t we just keep track of announcers in the orphanage (same as in the PR) and then add requests for the parents to the existing tracker on future announcements?
17:35 	<glozow> dergoegge: that's an idea. How would you schedule the requests?
17:35 	<dergoegge> ideally at the same time
17:35 	<glozow> Is there a cost to having another txrequest tracker? It's not that different from adding another std::map
17:35 	<glozow> No, I mean, how would you load-balance between peers, bake in preferences, etc.
17:36 	<dergoegge> isn't that what the existing tracker does?
17:36 	<glozow> Oh, you mean adding a new type to the tracker? So we'd have txid type, wtxid type, and orphan?
17:37 	<glozow> also note that the parent requests are added to the txrequest tracker
17:39 	<dergoegge> I guess I'm wondering why we need the concept of tracking the resolution by orphan id, as opposed to just putting the requests for the parents in the existing tracker
17:39 	<glozow> we do put the requests for the parents in the existing tracker
17:39 	<glozow> Maybe we are crossing wires?
17:39 	<instagibbs> mmmm he's asking why the add to new tracker, then "immediately" add to old one, vs just add to old one, I think
17:40 	<dergoegge> yea
17:40 	<instagibbs> add to old one with "proper delays"
17:40 	<instagibbs> I didn't get far enough in my review to figure this out either 😅
17:40 	<glozow> We might have multiple candidates for orphan resolution
17:41 	<glozow> Oh I see what you're saying
17:42 	<dergoegge> "multiple candidates" as in same parents different orphan?
17:42 	<glozow> Perhaps that could work, where you're treating it as if all of them just announced the missing parents? I don't know how you'd add `ancpkginfo` orphan resolution easily this way though.
17:43 	<glozow> different peers same orphan
17:43 	<instagibbs> You'd also have to somehow track that you're no longer asking for any parents of an orphan in order to EraseOrphanOfPeer?
17:43 	<marcofleon> yeah i was thinking it made sense with GetCandidatePeers. Having another tracker to separate out the orphan reso process
17:46 	<glozow> will think about this idea some more
17:47 	<dergoegge> me too 👍
17:47 	<glozow> I think it's possible it works? My main questions would be (1) what is the cost of having a second tracker? Because it's the perfect data structure for this. (2) how do we make it extensible to package relay still.
17:48 	<instagibbs> imo the cost is mostly an additional structure lying around that needs to stay in sync with other things
17:48 	<dergoegge> 1) if we don't need it then it's just extra complexity 2) fair!
17:48 	<marcofleon> The fact that there are candidates that be added or not added to that new tracker is why it made sense to me in the first place I guess is what i was saying
17:48 	<marcofleon> can be*
17:49 	<glozow> (1) shoving it into the existing tracker but needing to have extra logic could also be complexity!
17:49 	<dergoegge> well in my mind it wouldn't need extra logic but I might be wrong, need to think more
17:49 	<instagibbs> proof of code for this I think..
17:49 	<glozow> but yeah, I don't like that we need to ensure m_orphanage and m_orphan_resolution_tracker are in sync. that's super fair
17:49 	<dergoegge> frfr
17:50 	<glozow> yeah let's see what the code would look like
17:50 	<marcofleon> fr
17:50 	<glozow> fr r r

@glozow
Copy link
Member Author

glozow commented Dec 5, 2024

Thanks @dergoegge, will spend some time thinking about whether there are any problems with it. Perhaps we can do this easier version now and cross the orphan reso tracker bridge when we need to build the 2-step orphan resolution protocol (info and then transactions).

self.log.info("Check that fake parent does not cause orphan to be deleted and real package can still be submitted")
# 6. Child-sending should not have been punished and the orphan should remain in orphanage.
# 5. Child-sending should not have been punished and the orphan should remain in orphanage.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while we're touching this, we can directly assert via getorphantxs now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, and added a multi-announcer test to rpc_orphans.py

@glozow glozow force-pushed the 2024-11-multi-orphan branch from 3e16a36 to f96df3d Compare December 6, 2024 12:27
Copy link
Member Author

@glozow glozow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to the new approach, cleaned up a bunch of things, addressed most comments. I think I have 1 or 2 more to get to, and need to write an RPC test.

@glozow
Copy link
Member Author

glozow commented Dec 6, 2024

The 2 main functional differences I noticed with this approach are
(1) If there are multiple missing parents, we may be using different peers to request them.
(2) We don't EraseOrphanOfPeer if orphan resolution expires. This also means we don't delete the orphan after we've exhausted all potential orphan resolution candidates. This is similar to what we have today - it'll stay for 20min before orphanage expires the entry.

Copy link
Member

@instagibbs instagibbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

took another pass with the major new commit f96df3d

@@ -135,7 +135,7 @@ BOOST_AUTO_TEST_CASE(DoS_mapOrphans)
tx.vout[0].nValue = 1*CENT;
tx.vout[0].scriptPubKey = GetScriptForDestination(PKHash(key.GetPubKey()));

orphanage.AddTx(MakeTransactionRef(tx), i);
orphanage.AddTx(MakeTransactionRef(tx), i, {});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively, just don't set them ever unless they're intended to be used in the test, deleting most of these(?) won't invalidate tests

Copy link
Member

@instagibbs instagibbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 40abc3f

non-blocking suggestions only

@DrahtBot DrahtBot requested a review from dergoegge December 11, 2024 16:58
Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK
Did a first round of review, mostly comment nits (it's been a while since I last looked at the orphanage).

@glozow glozow force-pushed the 2024-11-multi-orphan branch from 40abc3f to a2ab8d2 Compare January 3, 2025 15:56
@glozow
Copy link
Member Author

glozow commented Jan 3, 2025

Thanks for the reviews @mzumsande @instagibbs! Just got back from holiday, addressed all the comments. Biggest change was recalculating missing parents each time we add a new orphan reso candidate. I also wrote a test for #31397 (comment)

Copy link
Member

@sipa sipa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's possible to avoid keeping track of the announcers in m_orphans, as I think it should match m_txrequest.GetCandidatePeers(orphan_tx)?

@@ -574,6 +574,17 @@ class TxRequestTracker::Impl {
}
}

std::vector<NodeId> GetCandidatePeers(const uint256& txhash) const
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "[txrequest] GetCandidatePeers"

Would it be possible to add an invocation/comparison with this function in the src/test/fuzz/txrequest.cpp fuzz test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in #31666

@glozow
Copy link
Member Author

glozow commented Jan 3, 2025

I wonder if it's possible to avoid keeping track of the announcers in m_orphans, as I think it should match m_txrequest.GetCandidatePeers(orphan_tx)?

That's true immediately after receiving the orphan, but we delete the entries from m_txrequest once we've added this tx to the orphanage - m_txrequest only stores what we haven't successfully downloaded yet. So I think we need to store the announcers somewhere else to remember that information, remove candidates that fail to resolve the orphan, and add new candidates if they announce the tx. TxOrphanage seems like the best place for it.

@sipa
Copy link
Member

sipa commented Jan 3, 2025

That's true immediately after receiving the orphan, but we delete the entries from m_txrequest once we've added this tx to the orphanage - m_txrequest only stores what we haven't successfully downloaded yet.

Right, of course!

Copy link
Member

@sipa sipa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK, some nits/questions:

@@ -301,14 +329,29 @@ void TxDownloadManagerImpl::MempoolAcceptedTx(const CTransactionRef& tx)
m_orphanage.EraseTx(tx->GetWitnessHash());
}

std::vector<Txid> TxDownloadManagerImpl::GetUniqueParents(const CTransaction& tx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this attempt to recurse, and give all ancestors? Not relevant for 1p1c, but in general, in both call sites that seems desirable (an announcement for an orphan should be seen as an announcement for all its ancestors, resolving an orphan means all ancestors can be considered received).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on how we would get missing ancestors beyond parents? We could iterate through the vin of parents we already have, but I don't see how any of those could be missing. And we can't look through the vin of a missing parent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, by recursing through transactions in the orphanage I meant.

Imagine a sequence A->B->C (A spends on output of B, B spends an output of C). We have received A and B already (in orphanage), but C is still missing. An announcement from a new peer for transaction A can reasonably be treated as an announcement for C from that peer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see what you mean, yes!


auto add_orphan_reso_candidate = [&](const CTransactionRef& orphan_tx, std::vector<Txid> unique_parents, NodeId nodeid, std::chrono::microseconds now) {
const auto& wtxid = orphan_tx->GetWitnessHash();
if (auto delay{OrphanResolutionCandidate(nodeid, wtxid, unique_parents.size())}) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic looks very similar to that in TxDownloadManagerImpl::AddTxAnnouncement. Is it possible to abstract it out?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is similar. I'm open to code diff suggestions.

I tried a few times to refactor but found some things a bit ugly to work around: we force the IsWtxid check to be false since we're requesting parents by txid, and the number of requests is not always 1 (also see #31397 (comment)). In the future, the function may deviate when we add more per-peer orphan resolution limits and/or another way to do orphan resolution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this?

bool TxDownloadManagerImpl::DoOrphanThingRenameMe(const std::vector<Txid>& unique_parents, const Wtxid& wtxid, NodeId nodeid, std::chrono::microseconds now)
{
    if (auto delay{OrphanResolutionCandidate(nodeid, wtxid, unique_parents.size())}) {
        const auto& info = m_peer_info.at(nodeid).m_connection_info;
        for (const auto& parent_txid : unique_parents) {
            m_txrequest.ReceivedInv(nodeid, GenTxid::Txid(parent_txid), info.m_preferred, now + *delay);
        }
        LogDebug(BCLog::TXPACKAGES, "added peer=%d as a candidate for resolving orphan %s\n", nodeid, wtxid.ToString());
        return true;
    }
    return false;
}

in add_orphan_reso_candidate:

if (DoOrphanThingRenameMe(unique_parents, orphan_tx->GetWitnessHash(), nodeid, now)) {
    m_orphanage.AddTx(orphan_tx, nodeid);
}

in TxDownloadManagerImpl::AddTxAnnouncement:

DoOrphanThingRenameMe(unique_parents, Wtxid::FromUint256(gtxid.GetHash()), peer, now);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense - I thought you meant a different bit of code. Added in #31666

glozow added 4 commits January 6, 2025 09:02
Needed for a later commit adding logic to ask the TxRequestTracker for a
list of announcers.  These announcers should know the parents of the
transaction they announced.
Add ability to add and track multiple announcers per orphan transaction,
erasing announcers but not the entire orphan.

The tx creation code in orphanage_tests needs to be updated so that each
tx is unique, because the CountOrphans() check assumes that calling
EraseForPeer necessarily means its orphans are deleted.

Unused for now.
Copy link
Contributor

@marcofleon marcofleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 86d7135

Ran all the relevant tests, including the fuzz tests for about 50 cpu hours each. The implementation here of keeping track of multiple peers that announce an orphan looks good to me. It makes sense to not be locked in to only one peer for orphan resolution, as that peer could disconnect (in which case we lose the orphan) or be malicious.

Just left some non-blocking nits below.

Also quick question for my own understanding. Is it true that in the honest case, orphan resolution will probably happen with the first peer that announced it to us? The peer should have the ancestors, so those txs should be sent shortly after the orphan tx was received by us, right?

return AlreadyHaveTx(GenTxid::Txid(txid), /*include_reconsiderable=*/false);
});

if (unique_parents.empty()) return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took me a bit to figure out how this could happen, so a comment might be helpful here.

Copy link
Member Author

@glozow glozow Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment in followup and also think we can EraseTx here. (EDIT: realized that was a bad idea, added a comment) See #31666

}
std::sort(unique_parents.begin(), unique_parents.end());
unique_parents.erase(std::unique(unique_parents.begin(), unique_parents.end()), unique_parents.end());
unique_parents = GetUniqueParents(tx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the loop below this, should it be const Txid& parent_txid? However, if this is more rework of this function than you're looking to do (adding multiple ToUint256), then seems fine to leave as is for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like they're about the same, so going to leave as is


protected:
/** Helper for getting deduplicated vector of Txids in vin. */
std::vector<Txid> GetUniqueParents(const CTransaction& tx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This doesn't use any member variables of TxDownloadManagerImpl so it could be just a standalone helper function in txdownloadman_impl.cpp.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Not changing for now because it may be useful to recurse in-orphanage ancestors in this function.

Copy link
Member

@instagibbs instagibbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reACK 86d7135

feel completely ok to ignore all my comments

Major change is recomputing the missing parents per announcement of orphan, rather than a static list for the lifetime of the orphan when first received, and also correctly accounting for those missing parents when deciding when to rate-limit the announcements for a specific peer..

// - exists in orphanage
// - peer can be an orphan resolution candidate
if (p2p_inv && gtxid.IsWtxid()) {
if (auto orphan_tx{m_orphanage.GetTx(Wtxid::FromUint256(gtxid.GetHash()))}) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (auto orphan_tx{m_orphanage.GetTx(Wtxid::FromUint256(gtxid.GetHash()))}) {
const auto wtxid{Wtxid::FromUint256(gtxid.GetHash()))};
if (auto orphan_tx{m_orphanage.GetTx(wtxid)}) {

and use wtxid below in the other two locations

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in #31666

node.sendrawtransaction(tx_replacer_BC["hex"])
assert tx_replacer_BC["txid"] in node.getrawmempool()
node.sendrawtransaction(tx_replacer_C["hex"])
assert tx_replacer_BC["txid"] not in node.getrawmempool()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert tx_replacer_BC["txid"] not in node.getrawmempool()
assert parent_peekaboo_AB["txid"] not in node.getrawmempool()
assert tx_replacer_BC["txid"] not in node.getrawmempool()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in #31666

assert tx_replacer_BC["txid"] not in node.getrawmempool()
assert tx_replacer_C["txid"] in node.getrawmempool()

# Second peer is an additional announcer for this orphan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Second peer is an additional announcer for this orphan
# Second peer is an additional announcer for this orphan with different missing parents than prior announcement

micro-nit something like?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in #31666


# Disconnect peer1. peer2 should become the new candidate for orphan resolution.
peer1.peer_disconnect()
node.bumpmocktime(NONPREF_PEER_TX_DELAY + TXID_RELAY_DELAY)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think this should work too?

Suggested change
node.bumpmocktime(NONPREF_PEER_TX_DELAY + TXID_RELAY_DELAY)
node.bumpmocktime(TXREQUEST_TIME_SKIP)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in #31666

@instagibbs
Copy link
Member

@marcofleon

Also quick question for my own understanding. Is it true that in the honest case, orphan resolution will probably happen with the first peer that announced it to us?

It Depends(TM). If the first announcer was an inbound connection and second was an outbound connection less than 2(?) seconds later, the outbound will be chosen first. see the functional test case test_orphan_handling_prefer_outbound

Copy link
Member

@dergoegge dergoegge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 86d7135

Other reviewers already mentioned the nits that I would have had as well. I think they are fine to address in a follow up.

Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 86d7135

I reviewed the code and tested it a bit on mainnet with some extra logging: The large majority of orphans gets resolved, a few orphans get stuck for 20 minutes if all of our candidates send NOTFOUNDs for the parent requests, presumably because the parent got removed from their mempools (this has been mentioned in #31397 (comment) ).

@fanquake fanquake merged commit 335798c into bitcoin:master Jan 16, 2025
18 checks passed
@@ -58,6 +58,10 @@ def wrapper(self):
self.generate(self.nodes[0], 1)
self.nodes[0].disconnect_p2ps()
self.nodes[0].bumpmocktime(LONG_TIME_SKIP)
# Check that mempool and orphanage have been cleared
assert_equal(0, len(self.nodes[0].getorphantxs()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this fails CI https://cirrus-ci.com/task/4584274617171968?logs=ci#L2784:

[15:13:20.370]  test  2025-01-16T15:13:19.950000Z TestFramework (ERROR): Assertion failed 
[15:13:20.370]                                    Traceback (most recent call last):
[15:13:20.370]                                      File "/ci_container_base/test/functional/test_framework/test_framework.py", line 135, in main
[15:13:20.370]                                        self.run_test()
[15:13:20.370]                                      File "/ci_container_base/ci/scratch/build-x86_64-pc-linux-gnu/test/functional/p2p_orphan_handling.py", line 812, in run_test
[15:13:20.370]                                        self.test_orphan_of_orphan()
[15:13:20.370]                                      File "/ci_container_base/ci/scratch/build-x86_64-pc-linux-gnu/test/functional/p2p_orphan_handling.py", line 62, in wrapper
[15:13:20.370]                                        assert_equal(0, len(self.nodes[0].getorphantxs()))
[15:13:20.370]                                      File "/ci_container_base/test/functional/test_framework/util.py", line 77, in assert_equal
[15:13:20.370]                                        raise AssertionError("not(%s)" % " == ".join(str(arg) for arg in (thing1, thing2) + args))
[15:13:20.370]                                    AssertionError: not(0 == 2)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait until?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I think it needs to be a wait_until

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glozow glozow deleted the 2024-11-multi-orphan branch January 16, 2025 16:07
for (const auto announcer: elem->second.announcers) {
// Get this source peer's work set, emplacing an empty set if it didn't exist
// (note: if this peer wasn't still connected, we would have removed the orphan tx already)
std::set<Wtxid>& orphan_work_set = m_peer_work_set.try_emplace(announcer).first->second;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seem a bit redundant to add the tx to more than one workset. Either we have all the parents now or we don't, so I don't see a point in attempting to validate it multiple times.
Assigning it to just one peer work set would run the risk of that peer disconnecting just after we assigned it to them, but that could be prevented by adding a last call to ProcessOrphanTx() in FinalizeNode() .

On the other hand, checking if we have all parents is probably cheap enough that it doesn't really matter either way?!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I seem to recall drafting a version that randomly selected 1 peer. That maybe saves us from a peer purposefully disconnecting (since they don't know whether we assigned them), but doesn't have this redundancy problem? What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, taking a random peer would be ok.
Though I kinda like the idea of calling ProcessOrphanTx() in FinalizeNode() in general, even if just 1 peer is involved - this is scheduled work that is assigned to but doesn't need any input from the peer, so there is no reason not do it just because the peer decides to disconnect at the wrong time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea too. But what if there is a large amount of orphans left? There shouldn't be a ton, but I don't know how to feel about a peer being granted more than the normal amount of compute because they are going to be disconnected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, withProcessOrphanTx() doing at most one actual validation, it would probably not be good to call it in a loop until all outstanding work is done. Maybe calling it just once (as if it was this peer's next turn) would already help in most cases, but to find out it'd probably be best to get some empirical data from mainnet.

@glozow
Copy link
Member Author

glozow commented Jan 16, 2025

Followup #31666 wraps up all the comments here except for 2 things which might be a little more involved:

#31397 (comment) (because we need to add a txid index)
#31397 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants