-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Erlay: bandwidth-efficient transaction relay protocol #18261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
How does it react to diverse network policies? |
Concept ACK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
Thanks for doing this! Excellent work!
When it comes to network policies, I'm using the same code originally used by regular gossip ("Determine transactions to relay" in net_processing.cpp). So nothing should be lost or sent wastefully as a result of policy discrepancies. |
@luke-jr I'm understanding your question as being inspired by earlier 'mempool sync' ideas that would bleed bandwidth if there was a long lasting mempool discrepancy. Erlay isn't a mempool sync. It's uses a way of communicating lists of things you want to relay which only takes bandwidth proportional to the difference rather than the total size. So there is no on-going bandwidth usage due to differences in mempool content. Bandwidth is used is roughly Antx_relayed + Bpeers*(difference_in_tx_relayed) + C*peers. for some constants A,B,C. If a peer has a radically different relay policy than you, it works fine and continues to save bandwidth below what usage would be without erlay even though the erlay savings itself comes largely from eliminating data that both sides would send. Does that answer your question? |
@naumenkogs When trying out this PR I ran in to two small testing issues:
|
@practicalswift The unit test minisketch binaries actually run forever. I need to fix that. |
I did some robustness testing of this code by pulling in PRs #17989 ( You can reproduce the issue by pulling in the commits from those PR:s and simply run:
The issue will be hit within a few seconds: Notice also how Perhaps this UUM is the reason of the intermittent test failure you're seeing? I encourage everybody to review (or at least Concept ACK :)) #17989 ( |
350029c
to
833fa13
Compare
Needs rebase :) |
I made some latest changes to make sure it can be plugged into a real network. @practicalswift I was planning to rebase once #18044 is merged. |
I ran 2 non-reachable nodes for around 24 hours, one is regular wtxid-node connected to legacy nodes, one is Erlay node connected to reconciliation-supporting nodes on the same host. Bandwidth breakdown (in megabytes):
Notice overall 40% bandwidth saving. |
More experiments: now I ran the same 2 nodes for 24 hours, but connected to 16 instead of 8 nodes (yeah, all 16 peers of Erlay node support reconciliation). Legacy wtxid-relay node spent 150 megabytes for announcements, while Erlay node spent 24 megabytes. Since these 2 days might have had different activity, it makes sense to compare the growth.. Legacy grew 2.23x (I guess due to more INVs in total), Erlay grew 1.8x. So, as expected, not only it's decent savings comparing to legacy, it also grows slower with connectivity. Based on the following facts I also make a guess that there's no point in tweaking Erlay forward for better performance:
Finally, I measured delay in receiving transactions. I take the earliest time I received a transaction, and take latency from there, comparing to:
I think this is expected and fine, as we talk in the paper, but let me know if you think differently. |
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
Sketch is a representation of list of transaction IDs, which enables reconciliation (an efficient sync of lists between peers).
When the time comes, a node computes a sketch of the local transactions based on the parameters sent in the reconciliation request, and sends that sketch to the requestor. Clear local state.
This currently unused function is supposed to be used once a reconciliation round is done. It cleans the state corresponding to the passed reconciliation.
When the sketches from both sides are combined successfully, the diff is produced. Then this diff can (together with the local txs) be used to identified which transactions are missing locally and remotely.
Send/request missing transactions, clear the state, and send a finalization message.
If after decoding a reconciliation sketch it turned out to be insufficient to find set difference, request extension. Store the initial sketches so that we are able to process extension sketch.
If peer failed to reconcile based on our initial response sketch, they will ask us for a sketch extension. Store this request to respond later.
Add 2 variables tracking reconciliation state: (1) recon set snapshot and (2) capacity snapshot. (1) is used to store transactions arrived after we sent out an initial sketch, but before the reconciliation is over, since these transactions should not go into a sketch extension. (2) is used to set the capacity of the extended sketch because it provides good estimation (efficiency/failure tradeoff).
Compute a sketch with extra capacity and send those extra syndromes to the peer. It is safe to respond to an extension request without a delay because these requests are already limited by initial requests.
If a peer responded to our request with a sketch extension, attempt to decode it to find missing transactions by combining it with the initial sketch. If success, share/request missing transactions. If failure, send all the transactions we had for the peer.
At the end of a reconciliation round, a peer may ask us for transactions by their short id. Add a function for a local lookup short_id->wtxid.
Once a peer tells us reconciliation is done, we should behave as follows: - if it was successful, just respond them with the transactions they asked by short ID. - if it was a full failure, respond with all local transactions from the reconciliation set snapshot - if it was a partial failure (only low or high part was failed after a bisection), respond with all transactions which were asked for by short id, and announce local txs which belong to the failed chunk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
making my way through the first couple commits
|
||
// Reconciliation is supported only when wtxid relay is supported for only | ||
// those connections which (at least might) support transaction relay. | ||
if (pfrom.IsFullOutboundConn() || pfrom.IsInboundConn() || pfrom.IsManualConn()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you extract this into its own CNode
function with a name that captures the intent & a case statement for all the connection types? you can see ExpectServicesFromConn
as an example.
flood_to = true; | ||
} | ||
|
||
uint64_t local_salt = peer->m_local_recon_salt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this salt calculation (and probably also the flood_to logic) seems like an example of code that can be extracted into the reconciliation module rather than added to ProcessMessage
.
it would make sense to me for the flow to be something like:
ProcessMessage
does internal state checks (m_tx_relay
exists, wtxid is enabled)- extract the message from
vRecv
- pass it along into the module to handle
- module returns saying "no-op" or "here is a ReconState object"
- if it exists,
ProcessMessage
stores the object on the peer
I suspect we could go even further, but this would be a start. Ideally the module would abstract this nuanced erlay specific logic away from net processing, even if its per-peer. The txrequest module is a good example for this, eg. PeerInfo
struct and how its applied within the module to make choices, and the decisions are surfaced to net_processing
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm planning to do something similar in the next couple days. That's why I marked the PR draft for now :)
Thank you for the design suggestion!
m_connman.ForEachNode([this, &result, &skip_node](const CNode* pnode) { | ||
if (!pnode->m_tx_relay) return; | ||
if (pnode->GetId() == skip_node.GetId()) return; | ||
if (!pnode->IsFullOutboundConn() && !pnode->IsManualConn()) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find (!A && !B) to be very confusing. Again, would be good to extract into its own function with switch statement.
@@ -914,6 +929,28 @@ void PeerManagerImpl::PushNodeVersion(CNode& pnode, int64_t nTime) | |||
} | |||
} | |||
|
|||
size_t PeerManagerImpl::GetFloodingOutboundsCount(const CNode& skip_node) const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of recalculating whenever we get a SENDRECON
message, could this logic be significantly simplified by just storing a count that gets updated when nodes do something to change their state? the characteristics being checked are all established early in the connection (connection type, m_tx_relay struct, flood_to being set)
I think this should remove the need for the "skip node" logic in this function, but if not, the node being processed can be checked and then subtract one if needed.
An update here. This way we could save the review efforts of those contributors who are less concerned about the modularity? First do that, then proceed to other questions. Once that is done, I'll get back here and welcome full review again. |
@naumenkogs I took a first look and I see that most of the comments I've left here are still applicable. I'm happy to do a deeper dive on the branch one you've addressed the outstanding comments. |
static const auto RECON_SALT_HASHER = TaggedHash(RECON_STATIC_SALT); | ||
uint256 full_salt = (CHashWriter(RECON_SALT_HASHER) << salt1 << salt2).GetSHA256(); | ||
|
||
peer->m_recon_state = MakeUnique<ReconState>(recon_requestor, recon_responder, flood_to, full_salt.GetUint64(0), full_salt.GetUint64(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use std::make_unique
in new code.
Closing in favor of #21515. I believe I addressed most of the comments from here. |
#21515 is ready for review now |
This is an implementation of Erlay, using primitives in the BIP-330 (see the updated spec here). Please refer to these two to understand the design. My talk on the topic is here.
Erlay uses both flooding (announcing using INV messages to all peers) and reconciliation to announce transactions. Flooding is expensive, so Erlay seeks to use it sparingly and in strategic locations - only well-connected publicly reachable nodes flood transactions to other publicly reachable nodes via outbound connections. Since every unreachable node is directly connected to several reachable nodes, this policy ensures that a transaction is quickly propagated to be within one hop from most of the nodes in the network.
All transactions not propagated through flooding are propagated through efficient set reconciliation. To do this, every node keeps a reconciliation set for each peer, in which transactions are placed which would have been announced using INV messages absent this protocol. Every 2 seconds every node chooses a peer from its outbound connections in a predetermined order to reconcile with, resulting in both sides learning the transactions known to the other side. After every reconciliation round, the corresponding reconciliation set is cleared.
I think both paper and the BIP motives the changes, but I'll mention them briefly once again here:
Obviously looking for review, let's try to start with a high-level concerns, and keep nits for later.
P.S.
Please don't be scared of 8,000 LOC added. 7,000 of them is minisketch added as a subtree.
P.P.S.
My experiments of running this code live (slightly outdated) with a script to replicate the experiment: here1 and here2.