-
Notifications
You must be signed in to change notification settings - Fork 37.8k
p2p: Diversify automatic outbound connections with respect to networks #27213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
p2p: Diversify automatic outbound connections with respect to networks #27213
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
We have 3 open questions that we are interested in hearing feedback from reviewers. We have considered many potential interactions and have laid out a couple specific ones along with tradeoffs of different approaches. IBDThe first is regarding the interactions between this patch & IBD. Peers on privacy network might be slower than clearnet peers (eg. Tor or I2P). Our current proposal does not have any special case logic for disabling the network prioritization during IBD. This could mean that our overall IBD speed is reduced by not having the fastest peers possible. The network peers could also be intermittently disconnected for stalling & get the node into a loop where it, for example, disconnects a Tor peer, connects to an IPV4 peer, re-adds a Tor peer and then keeps repeating. Finally, there could be interactions with the slow peers causing delays in the response time of the faster peers based on the specifics of our IBD logic. One solution is to just disable the network-specific outbound logic during IBD. On the other hand, a node can get slow peers by random chance anyways. Whether randomly connecting to altnet peers or because a clearnet peer just happens to be slow. If they are slow, but not slow enough to be evicted due to the stalling limit, that will already have a significant impact on the IBD rate. The issues and potential solutions with improving IBD can be seen as tangential to this proposal, and we could favor not adding custom workarounds that serve as a bandaid to mask the root cause. Enabling alternative networks during IBD is already known to slow down the sync dramatically, and we can just add a warning in the documentation to help users become more aware. Peers protected from evictionThis patch adds eviction protection for special network peers, but it only gets triggered when we have an extra outbound-full-relay peer (aka 9). This occurs when we detect a stale tip or because we added another special network peer. In these situations we’d want to ensure that we have at least one non-protected peer that can be disconnected so the node can return to the steady state of 8 full-relay outbound peers. Currently, we protect 4 peers via Since it would require adding support for at least one more network to run into this issue, we have not addressed it in this patch. There are multiple viable mitigation strategies: One option is to reduce the number of peers we protect with the headers logic (via Network peer protectionA potential concern of this patch is that the logic that protects network peers from eviction could be “over-protective”. For example, keeping a Tor connection only because it is the only Tor connection, not because it’s providing us meaningful data. Any concerns around this are completely bounded by the fact that the protection logic is limited to when we have an extra peer, either through stale tip detection or through the new temporary connections for network specific peers. However, if there is still concern, @ajtowns had an interesting idea to increase our likelihood of adding a 2nd peer from alternate networks. We could update the stale tip logic to choose the network of the extra outbound differently. Right now, the network odds reflect the proportions represented in Increasing the chances of connecting to a privacy network in that context has two benefits: This could be treated as a separate chunk of work to be evaluated in a follow-up, or it could be incorporated into this PR if reviewers believe it could help mitigate concerns about edge case possibilities. |
c9bf51b
to
ee8c258
Compare
Concept ACK. This was somewhere on my TODO, thanks for picking it up!
Hmm, this is maybe ok. Another approach would be to try to open to e.g. Tor while still having < 8 outbound connections. To counter the issue of staying with < 8 connections and not being able to open to Tor because e.g. the Tor proxy is broken, then we could give up after a few attempts and then fall back to the old way of filling with random (aka IPv4 ;-)) addresses. If we overshoot the 8 connections (like described above), then for some time we will have more, until eviction kicks in. How much time is that exactly? If that is short then it would be sub-optimal to open a IPv4 connection and drop it shortly after in cases where Tor connectivity is working smoothly. If it is long then we have practically increased the limit of 8 which will cause more traffic.
Why it doesn't make sense? I think it is an extra effort to obtain IPv4 addresses if one has IPv6 addresses and the other way around. Why impossible in some network environments? After all you know if you are going to attempt to connect to
+1
I think there is no need to add special logic during IBD (e.g. disable this PR during IBD). Clearnet peers can also be slow and non-clearnet peers may be fast-enough. IBD should be able to handle slow/fast peers already without special hand-holding.
I think we don't need to worry about this now because even if we treat IPv4 and IPv6 as separate networks and all 4 from
I don't see it as "over-protective". Can we not have some metric of whether a peer is "providing us a meaningful data"? I am ok to deal with this in a followup. Taking a step back, the root problem here is that addrman has many IPv4 addresses and few I2P ones (for example). If it would have equal number of addresses from each network, then this problem would not exist because then picking up a random one would have a 1/5 chance of selecting any network (thus there would not be any bias towards IPv4). Would it be simpler and solve the problem equally well if we only change addrman internally to return an address from any given network with a 1/5 chance? Then no need to change the eviction logic or the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! My thoughts on just some of your points:
I think we don't need to worry about this now because even if we treat IPv4 and IPv6 as separate networks and all 4 from
m_protected
are from one network (e.g. IPv4), then we have 4 more slots for IPv6, Tor, I2P, CJDNS.
I think that scenario would already be problematic: In this case, all 8 outbound peers could be protected, and while there would be no need to add network-related extra peers, we could still add an extra outbound due to stale tip (StartExtraBlockRelayPeers()
), and then we wouldn't have a peer to evict.
I don't see it as "over-protective". Can we not have some metric of whether a peer is "providing us a meaningful data"? I am ok to deal with this in a followup.
I agree. We have lots of different metrics for that (e.g. all Misbehaving()
calls, ConsiderEviction()
for stale tip, inactivity checks) that would apply to network-specific peers in the same way as for unprotected peers. Maybe some of these can be improved, but I think this should be unrelated to this PR.
Would it be simpler and solve the problem equally well if we only change addrman internally to return an address from any given network with a 1/5 chance? Then no need to change the eviction logic or the ThreadOpenConnections logic. And no need to guarantee explicitly in the code that at least one connection exists to each network. It will happen by itself most of the time.
That would certainly be simpler (although I found the necessary non-addrman changes in this PR to be surprisingly small), but as you wrote we'd only have a connection to each network with a certain probability. I did a quick simulation (too lazy to try and do the math), and it appears that with 5 networks and 8 outbounds, the chance to be connected to all of them is just 32% (62% for 4 nets, 88% for 3, 99% for 2).
I would have to think a bit more about the consequence on eclipse attacks for this - for example, a small network like CJDNS is probably pretty easy to eclipse currently, and if we configure IPv4 and CJDNS to be reachable and choose each connection with a base probability of 50% to CJDNS, that could be pretty scary...
But just noting that the two approaches aren't exclusive and implementing both would be possible.
One interesting idea for possible follow-ups would be optional logic that ties certain behavior to networks (e.g. broadcasting wallet transactions only to peers from anonymity networks). In order for that to work, it would be good to have stronger guarantee that we are actually connected to these networks.
"... then start disconnecting from
I think it goes like this: (with 5 networks) the chance of not selecting IPv4 for the first peer is I think this is enough to ditch the idea of random-chance offtopic:
Yes, please! :) To take this one step further, it would be even better to create a temporary connection(s) to Tor/I2P, broadcast our transaction(s) and close the connection(s) (with I2P transient address). This way, if we broadcast another tx after e.g. 1 hour the recipient will not be able to conclude that both transactions come from the same origin.
Would not be needed with temporary tx-broadcast connections. |
I agree with that conclusion, but I think you can't just add the probabilities like that. I think the correct formula is |
Concept ACK |
Concept ACK: I think, having network diversity in outbound connections is helpful and improves partition resistance.
I also not sure if I agree with counting IPv4 and IPv6 as one network and I think this could need a bit more reasoning. Is it because they are both clearnet? Is it a workaround to don't have 8 protected peers if all networks are enabled (which we'd also face when adding another network like yggdrasil as you mentioned)? I had the feeling that protecting only one IPv4 or IPv6 could, for example, reduce the number of inbound connections IPv6-only nodes get. However, I'm not sure if that's really the case (I need to think about this more). |
We thought about this, of course, because it's the most straightforward approach. However, don't like it as much as the approach here for several reasons:
I guess it boils down to that in general, I think temporarily having more outbound connections than our target is preferable to temporarily having less connections. |
I think our reasoning was:
So basically, we saw no clear upsides, and some downsides, and that's why we decided to add it. It had nothing to do with the number of 8 protected peers. I'm sure we could find other solutions around that edge case problem, e.g. adding logic to just not try an extra outbound in the first place when we could get into a position where we don't have an unprotected peer to evict, or softening the protection during eviction as vasild suggested. |
I'm not sure it has any influence on this pr, but did you consider doing something similar for block-relay-only peers? Something like: connect to one generic block-relay-only peer, plus an additional block-relay-only peer for every network we support? I think that should only improve things, including during IBD. It wouldn't do anything to improve tx propagation of course, so wouldn't make this PR redundant. |
thank you everyone for the thoughtful feedback! some questions & responses - RE: logic to initially overshoot number of connections
Since we already have short-lived connections through feelers, rotating extra block relay only connections, and stale tip detection, can you clarify the disadvantage of adding another? Especially since we wouldn’t expect this new mechanism to occur frequently, does it seem acceptable or intrinsically different somehow? RE: ipv4 vs ipv6@vasild: Are you suggesting that treating them as separate networks increases the cost of attack? Does the delta in energy required from the attacker feel meaningful?
Hm interesting. Future improvementsAgreed these are orthogonal to the current proposal, but are very interesting ideas worth further exploration! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
Also, should we have (at least) one anchor per network instead of only two "generic" ones? |
added a commit that increases fuzz coverage for select by network
yeah, def think it's worth considering in relation to increasing block-relay-peers. @mzumsande & I have started brainstorming & plan to open an issue once we formulate some initial ideas. in the meanwhile, feel free to reach out directly if you're interested in discussing further :) |
You did not answer my question "How much time is that exactly?" but your reply seems to imply an answer "short", correct me if I misunderstand. The disadvantage of doing that is "it would be sub-optimal to open a IPv4 connection and drop it shortly after in cases where Tor connectivity is working smoothly".
Yes.
That question is a bit vague. I guess "yes" for some values of "meaningful". |
17e7054 doc: clarify new_only param for Select function (Amiti Uttarwar) b0010c8 bench: test select for a new table with only one address (Amiti Uttarwar) 9b91aae bench: add coverage for addrman select with network parameter (Amiti Uttarwar) 22a4d14 test: increase coverage of addrman select (without network) (Amiti Uttarwar) a98e542 test: add addrman test for special case (Amiti Uttarwar) 5c8b4ba tests: add addrman_select_by_network test (Amiti Uttarwar) 6b22928 addrman: add functionality to select by network (Amiti Uttarwar) 26c3bf1 scripted-diff: rename local variables to match modern conventions (Amiti Uttarwar) 4880641 refactor: consolidate select logic for new and tried tables (Amiti Uttarwar) ca2a9c5 refactor: generalize select logic (Amiti Uttarwar) 052fbcd addrman: Introduce helper to generalize looking up an addrman entry (Amiti Uttarwar) 9bf078f refactor: update Select_ function (Amiti Uttarwar) Pull request description: For the full context & motivation of this patch, see #27213 This is joint work with mzumsande. This PR adds functionality to `AddrMan::Select` to enable callers to specify a network they are interested in. Along the way, it refactors the function to deduplicate the logic, updates the local variables to match modern conventions, adds test coverage for both the new and existing `Select` logic, and adds bench tests for the worst case performance of both the new and existing `Select` logic. This functionality is used in the parent PR. ACKs for top commit: vasild: ACK 17e7054 brunoerg: re-ACK 17e7054 ajtowns: ACK 17e7054 mzumsande: Code Review ACK 17e7054 Tree-SHA512: e99d1ce0c44a15601a3daa37deeadfc9d26208a92969ecffbea358d57ca951102d759734ccf77eacd38db368da0bf5b6fede3cd900d8a77b3061f4adc54e52d8
7471e55
to
8f91e58
Compare
Connman uses this new map to keep a count of active OUTBOUND_FULL_RELAY and MANUAL connections. Unused until next commit. Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
d8b3d90
to
be73855
Compare
thanks for the reviews & feedback @naumenkogs, @vasild, @willcl-ark & @fanquake I've updated this PR to incorporate all the open feedback & fix up the commit messages. everything should be addressed except for should be ready for re-review. thanks! |
If a peer is the only one of its network, protect it from eviction. This improves the diversity of outbound connections with respect to reachable networks. Co-authored-by: Amiti Uttarwar <amiti@uttarwar.org>
Co-authored-by: Amiti Uttarwar <amiti@uttarwar.org>
Diversify outbound connections with respect to networks: Every ~5 minutes, try to add an extra connection to a reachable network which we currently don't have a connection to. This is done defensively - only try management with respect to networks after all existing outbound slots are filled. The resulting situation with an extra outbound peer will be handled by the extra outbound eviction logic, which protects peers from eviction if they are the only ones for their network. Co-authored-by: Amiti Uttarwar <amiti@uttarwar.org>
be73855
to
1b52d16
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 1b52d16
ACK 1e65aae Updates shown by |
ACK 1b52d16 |
@willcl-ark note that you ACK'd the wrong commit here. |
7463d25 doc: Add release note (Amiti Uttarwar) Pull request description: release notes for bitcoin#27213 ACKs for top commit: mzumsande: ACK 7463d25 Tree-SHA512: 16c479774ed9242d8d044d08cc919550ccd07020423a3dcd99f07dad36e4dafd8243dc47f9f7f0c8eedcb53efd85ec65afedba56422452f637d313ec7c901520
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Post-merge ACK.
The last commit, p2p: network-specific management of outbound connections, looks like it could use test coverage.
I've been testing the changes here these past two days, with added/improved logging to monitor the new behavior.
@@ -1895,6 +1939,9 @@ void CConnman::ThreadOpenConnections(const std::vector<std::string> connect) | |||
} | |||
LogPrint(BCLog::NET, "Making feeler connection to %s\n", addrConnect.ToStringAddrPort()); | |||
} | |||
|
|||
if (preferred_net != std::nullopt) LogPrint(BCLog::NET, "Making network specific connection to %s on %s.\n", addrConnect.ToStringAddrPort(), GetNetworkName(preferred_net.value())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestions for this log entry:
-
use
LogPrintLevel
-
replace "Making" with "Trying to make" (like the anchor connection logging), as connections to privacy networks regularly fail
-
add missing hyphen to "network specific" and log the network just after it, instead of at the end where it is less useful (and a bit less clear, e.g. "on onion")
-
log the IP address only if the
-logips
config option is set -
add the connection type
possible diff
-
- if (preferred_net != std::nullopt) LogPrint(BCLog::NET, "Making network specific connection to %s on %s.\n", addrConnect.ToStringAddrPort(), GetNetworkName(preferred_net.value()));
+ if (preferred_net.has_value()) {
+ LogPrintLevel(BCLog::NET, BCLog::Level::Debug, "Trying to make a network-specific %s connection %s(%s)\n",
+ GetNetworkName(preferred_net.value()),
+ fLogIPs ? strprintf("to %s ", addrConnect.ToStringAddrPort()) : "",
+ ConnectionTypeAsString(ConnectionType::OUTBOUND_FULL_RELAY));
+ }
Example output:
[net:debug] Trying to make a network-specific cjdns connection to [fc70:de9d:7fe2:b32:5828:1a3c:d0f:83ec]:8333 (outbound-full-relay)
@@ -5159,6 +5161,9 @@ void PeerManagerImpl::EvictExtraOutboundPeers(std::chrono::seconds now) | |||
if (state == nullptr) return; // shouldn't be possible, but just in case | |||
// Don't evict our protected peers | |||
if (state->m_chain_sync.m_protect) return; | |||
// If this is the only connection on a particular network that is | |||
// OUTBOUND_FULL_RELAY or MANUAL, protect it. | |||
if (!m_connman.MultipleManualOrFullOutboundConns(pnode->addr.GetNetwork())) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We log the disconnection and protection actions in EvictExtraOutboundPeers()
and it would be good to log this action as well.
While testing and monitoring the changes in this pull, I found the following diff to be helpful.
- if (!m_connman.MultipleManualOrFullOutboundConns(pnode->addr.GetNetwork())) return;
+ if (!m_connman.MultipleManualOrFullOutboundConns(pnode->addr.GetNetwork())) {
+ LogPrintLevel(BCLog::NET, BCLog::Level::Debug,
+ "Protecting from eviction last remaining outbound %s peer=%d that relays transactions%s (%s)\n",
+ GetNetworkName(pnode->addr.GetNetwork()), pnode->GetId(),
+ fLogIPs ? strprintf(": %s", pnode->addr.ToStringAddrPort()) : "",
+ pnode->ConnectionTypeAsString());
+ return;
+ }
2023-08-09T18:52:42.433930Z [net:debug] Trying to make a network-specific cjdns connection to [fc70:de9d:7fe2:b32:5828:1a3c:d0f:83ec]:8333 (outbound-full-relay)
2023-08-09T18:53:16.324002Z [net:debug] Protecting from eviction last remaining outbound cjdns peer=26 that relays transactions: [fc70:de9d:7fe2:b32:5828:1a3c:d0f:83ec]:8333 (outbound-full-relay)
} // no default case, so the compiler can warn about missing cases | ||
|
||
assert(false); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can more simply use the existing helpers.
bool IsManualOrFullOutboundConn() const { return IsFullOutboundConn() || IsManualConn(); }
@@ -1795,6 +1825,17 @@ void CConnman::ThreadOpenConnections(const std::vector<std::string> connect) | |||
next_feeler = GetExponentialRand(now, FEELER_INTERVAL); | |||
conn_type = ConnectionType::FEELER; | |||
fFeeler = true; | |||
} else if (nOutboundFullRelay == m_max_outbound_full_relay && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first line of this conditional seems redundant, as it is already checked a few lines earlier for the OUTBOUND_FULL_RELAY
case.
This is joint work with mzumsande.
This is a proposal to diversify outbound connections with respect to reachable networks. The existing logic evaluates peers for connection based purely on the frequency of available addresses in
AddrMan
. This PR adds logic to automatically connect to alternate reachable networks and adds eviction logic that protects one existing connection to each network.For instance, if
AddrMan
is populated primarily with IPv4 and IPv6 addresses and only a handful of onion addresses, it is likely that we won't establish any automatic outbound connections to Tor, even if we're capable of doing so. For smaller networks like CJDNS, this is even more of an issue and often requires adding manual peers to ensure regularly being connected to the network.Connecting to multiple networks improves resistance to eclipse attacks for individual nodes. It also benefits the entire p2p network by increasing partition resistance and privacy in general.
The automatic connections to alternate networks is done defensively, by first filling all outbound slots with random addresses (as in the status quo) and then adding additional peers from reachable networks the node is currently not connected to. This approach ensures that outbound slots are not left unfilled while attempting to connect to a network that may be unavailable due to a technical issue or misconfiguration that bitcoind cannot detect.
Once an additional peer is added and we have one more outbound connection than we want, outbound eviction ensures that peers are protected if they are the only ones for their network.
Manual connections are also taken into account: If a user already establishes manual connections to a trusted peer from a network, there is no longer a need to make extra efforts to ensure we also have an automatic connection to it (although this may of course happen by random selection).