Skip to content

Conversation

naumenkogs
Copy link
Member

@naumenkogs naumenkogs commented Mar 4, 2020

This is an implementation of Erlay, using primitives in the BIP-330 (see the updated spec here). Please refer to these two to understand the design. My talk on the topic is here.

Erlay uses both flooding (announcing using INV messages to all peers) and reconciliation to announce transactions. Flooding is expensive, so Erlay seeks to use it sparingly and in strategic locations - only well-connected publicly reachable nodes flood transactions to other publicly reachable nodes via outbound connections. Since every unreachable node is directly connected to several reachable nodes, this policy ensures that a transaction is quickly propagated to be within one hop from most of the nodes in the network.

All transactions not propagated through flooding are propagated through efficient set reconciliation. To do this, every node keeps a reconciliation set for each peer, in which transactions are placed which would have been announced using INV messages absent this protocol. Every 2 seconds every node chooses a peer from its outbound connections in a predetermined order to reconcile with, resulting in both sides learning the transactions known to the other side. After every reconciliation round, the corresponding reconciliation set is cleared.

I think both paper and the BIP motives the changes, but I'll mention them briefly once again here:

  • save 40% of the bandwidth consumed by a node
  • increase network connectivity for almost no bandwidth or latency cost
  • improves privacy as a side-effect

Obviously looking for review, let's try to start with a high-level concerns, and keep nits for later.

P.S.
Please don't be scared of 8,000 LOC added. 7,000 of them is minisketch added as a subtree.

P.P.S.
My experiments of running this code live (slightly outdated) with a script to replicate the experiment: here1 and here2.

@luke-jr
Copy link
Member

luke-jr commented Mar 4, 2020

How does it react to diverse network policies?

@Empact
Copy link
Contributor

Empact commented Mar 4, 2020

Concept ACK

Copy link
Contributor

@practicalswift practicalswift left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

Thanks for doing this! Excellent work!

@naumenkogs
Copy link
Member Author

@luke-jr

When it comes to network policies, I'm using the same code originally used by regular gossip ("Determine transactions to relay" in net_processing.cpp). So nothing should be lost or sent wastefully as a result of policy discrepancies.

@gmaxwell
Copy link
Contributor

gmaxwell commented Mar 8, 2020

@luke-jr I'm understanding your question as being inspired by earlier 'mempool sync' ideas that would bleed bandwidth if there was a long lasting mempool discrepancy.

Erlay isn't a mempool sync. It's uses a way of communicating lists of things you want to relay which only takes bandwidth proportional to the difference rather than the total size. So there is no on-going bandwidth usage due to differences in mempool content. Bandwidth is used is roughly Antx_relayed + Bpeers*(difference_in_tx_relayed) + C*peers. for some constants A,B,C.

If a peer has a radically different relay policy than you, it works fine and continues to save bandwidth below what usage would be without erlay even though the erlay savings itself comes largely from eliminating data that both sides would send.

Does that answer your question?

@practicalswift
Copy link
Contributor

practicalswift commented Mar 9, 2020

@naumenkogs When trying out this PR I ran in to two small testing issues:

  • The suffix of the functional test p2p_erlay is .p2p (p2p_erlay.p2p) instead of the expected .py (p2p_erlay.py) :)
  • It seems like make check runs the minisketch binaries test-exhaust and test-exhaust-verify. The running times of these are quite substantial - is there some way to do a quick sanity check as part of make check instead of exhaustive testing? :)

@sipa
Copy link
Member

sipa commented Mar 9, 2020

@practicalswift The unit test minisketch binaries actually run forever. I need to fix that.

@practicalswift
Copy link
Contributor

@naumenkogs

I did some robustness testing of this code by pulling in PRs #17989 (ProcessMessage(…) fuzzer). and #18288 (MemorySanitizer) and found an use of uninitialized memory (UUM) that is remotely triggerable.

You can reproduce the issue by pulling in the commits from those PR:s and simply run:

$ src/test/fuzz/process_message
…

The issue will be hit within a few seconds: libFuzzer is amazing :)

Notice also how libFuzzer will automatically find the newly added message names (wtxidrelay, sendrecon, reqrecon, sketch, reqbisec and reconcildiff) and probe using them all! The fuzzer harness does not need to be teached about the existence of those :)

Perhaps this UUM is the reason of the intermittent test failure you're seeing?

I encourage everybody to review (or at least Concept ACK :)) #17989 (ProcessMessage(…) fuzzer). and #18288 (MemorySanitizer): having them merged would be great for robustness/security.

@practicalswift
Copy link
Contributor

Needs rebase :)

@naumenkogs
Copy link
Member Author

I made some latest changes to make sure it can be plugged into a real network.
Please help with testing this by running a couple inter-connected Erlay nodes, and observing bandwidth (and ideally tx relay latency).

@practicalswift I was planning to rebase once #18044 is merged.

@naumenkogs
Copy link
Member Author

naumenkogs commented Mar 15, 2020

I ran 2 non-reachable nodes for around 24 hours, one is regular wtxid-node connected to legacy nodes, one is Erlay node connected to reconciliation-supporting nodes on the same host.

Bandwidth breakdown (in megabytes):

Legacy (sent) Erlay (sent) Legacy (received) Erlay (received)
inv 38 0.4 22 5.3
getdata 6 5.7 1 0
sketch - - - 1.2
reqrecon, reqbisec, reconcildiff - 0.7 - -
tx, blocktxn 3 0.3 75 75
total (incl. other) 48 7.1 103 84

Notice overall 40% bandwidth saving.
Please help by running similar experiments and sharing bandwidth results :)
Here's the script I hacked together for bandwidth analysis (run nodes with debug=1)
Please sanitize your results before publishing: sometimes there's noisy bandwidth like extra blocks due to some reasons I'm not aware of.

@naumenkogs
Copy link
Member Author

naumenkogs commented Mar 18, 2020

More experiments: now I ran the same 2 nodes for 24 hours, but connected to 16 instead of 8 nodes (yeah, all 16 peers of Erlay node support reconciliation).

Legacy wtxid-relay node spent 150 megabytes for announcements, while Erlay node spent 24 megabytes. Since these 2 days might have had different activity, it makes sense to compare the growth.. Legacy grew 2.23x (I guess due to more INVs in total), Erlay grew 1.8x. So, as expected, not only it's decent savings comparing to legacy, it also grows slower with connectivity.

Based on the following facts I also make a guess that there's no point in tweaking Erlay forward for better performance:

  • only 0.0016 reconciliation failed (fully or partially) — meaning we don't underestimate much
  • only 15% of the announcement bandwidth is sketches — meaning we don't overestimate much
  • only 15% of the announcement bandwidth is outbound invs (which are much more likely to still be a little wasteful, because those peers are better connected than us and likely don't need those announemens)

Finally, I measured delay in receiving transactions. I take the earliest time I received a transaction, and take latency from there, comparing to:

  • an average receiving time at all legacy nodes: 0.92s
  • at the Erlay node: 1.47s

I think this is expected and fine, as we talk in the paper, but let me know if you think differently.

@DrahtBot
Copy link
Contributor

DrahtBot commented Mar 20, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

Sketch is a representation of list of transaction IDs,
which enables reconciliation (an efficient sync of lists
between peers).
When the time comes, a node computes a sketch of the
local transactions based on the parameters sent
in the reconciliation request, and sends that sketch
to the requestor. Clear local state.
This currently unused function is supposed to be used once
a reconciliation round is done. It cleans the state corresponding
to the passed reconciliation.
When the sketches from both sides are combined successfully,
the diff is produced. Then this diff can (together with the local txs)
be used to identified which transactions are missing locally and remotely.
Send/request missing transactions, clear the state,
and send a finalization message.
If after decoding a reconciliation sketch it turned out
to be insufficient to find set difference, request extension.
Store the initial sketches so that we are able to process
extension sketch.
If peer failed to reconcile based on our initial response sketch,
they will ask us for a sketch extension. Store this request to respond later.
Add 2 variables tracking reconciliation state:
(1) recon set snapshot and (2) capacity snapshot.
(1) is used to store transactions arrived after we sent out
an initial sketch, but before the reconciliation is over, since
these transactions should not go into a sketch extension.
(2) is used to set the capacity of the extended sketch because
it provides good estimation (efficiency/failure tradeoff).
Compute a sketch with extra capacity and send
those extra syndromes to the peer.
It is safe to respond to an extension request without a delay
because these requests are already limited by initial requests.
If a peer responded to our request with a sketch extension,
attempt to decode it to find missing transactions by combining
it with the initial sketch.
If success, share/request missing transactions.
If failure, send all the transactions we had for the peer.
At the end of a reconciliation round, a peer may ask us
for transactions by their short id. Add a function for a local
lookup short_id->wtxid.
Once a peer tells us reconciliation is done, we should behave as follows:
- if it was successful, just respond them with the transactions they asked
  by short ID.
- if it was a full failure, respond with all local transactions from the reconciliation
  set snapshot
- if it was a partial failure (only low or high part was failed after a bisection),
  respond with all transactions which were asked for by short id,
  and announce local txs which belong to the failed chunk.
Copy link
Contributor

@amitiuttarwar amitiuttarwar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

making my way through the first couple commits


// Reconciliation is supported only when wtxid relay is supported for only
// those connections which (at least might) support transaction relay.
if (pfrom.IsFullOutboundConn() || pfrom.IsInboundConn() || pfrom.IsManualConn()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you extract this into its own CNode function with a name that captures the intent & a case statement for all the connection types? you can see ExpectServicesFromConn as an example.

flood_to = true;
}

uint64_t local_salt = peer->m_local_recon_salt;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this salt calculation (and probably also the flood_to logic) seems like an example of code that can be extracted into the reconciliation module rather than added to ProcessMessage.

it would make sense to me for the flow to be something like:

  • ProcessMessage does internal state checks (m_tx_relay exists, wtxid is enabled)
  • extract the message from vRecv
  • pass it along into the module to handle
  • module returns saying "no-op" or "here is a ReconState object"
  • if it exists, ProcessMessage stores the object on the peer

I suspect we could go even further, but this would be a start. Ideally the module would abstract this nuanced erlay specific logic away from net processing, even if its per-peer. The txrequest module is a good example for this, eg. PeerInfo struct and how its applied within the module to make choices, and the decisions are surfaced to net_processing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm planning to do something similar in the next couple days. That's why I marked the PR draft for now :)

Thank you for the design suggestion!

m_connman.ForEachNode([this, &result, &skip_node](const CNode* pnode) {
if (!pnode->m_tx_relay) return;
if (pnode->GetId() == skip_node.GetId()) return;
if (!pnode->IsFullOutboundConn() && !pnode->IsManualConn()) return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find (!A && !B) to be very confusing. Again, would be good to extract into its own function with switch statement.

@@ -914,6 +929,28 @@ void PeerManagerImpl::PushNodeVersion(CNode& pnode, int64_t nTime)
}
}

size_t PeerManagerImpl::GetFloodingOutboundsCount(const CNode& skip_node) const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of recalculating whenever we get a SENDRECON message, could this logic be significantly simplified by just storing a count that gets updated when nodes do something to change their state? the characteristics being checked are all established early in the connection (connection type, m_tx_relay struct, flood_to being set)

I think this should remove the need for the "skip node" logic in this function, but if not, the node being processed can be checked and then subtract one if needed.

@naumenkogs
Copy link
Member Author

An update here.
Based on some suggestions, I moved reconciliation to a separate module.
I suggest doing some initial review of that in the PR in my repo. Could you folks help to evaluate my new approach? @jnewbery @amitiuttarwar @sipa @jonatack (everyone else is welcome too)

This way we could save the review efforts of those contributors who are less concerned about the modularity? First do that, then proceed to other questions.

Once that is done, I'll get back here and welcome full review again.

@amitiuttarwar
Copy link
Contributor

@naumenkogs I took a first look and I see that most of the comments I've left here are still applicable. I'm happy to do a deeper dive on the branch one you've addressed the outstanding comments.

static const auto RECON_SALT_HASHER = TaggedHash(RECON_STATIC_SALT);
uint256 full_salt = (CHashWriter(RECON_SALT_HASHER) << salt1 << salt2).GetSHA256();

peer->m_recon_state = MakeUnique<ReconState>(recon_requestor, recon_responder, flood_to, full_salt.GetUint64(0), full_salt.GetUint64(1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@naumenkogs
Copy link
Member Author

Closing in favor of #21515. I believe I addressed most of the comments from here.

@naumenkogs naumenkogs closed this Mar 23, 2021
@naumenkogs
Copy link
Member Author

#21515 is ready for review now

@bitcoin bitcoin locked as resolved and limited conversation to collaborators Aug 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.