Skip to content

Conversation

amitiuttarwar
Copy link
Contributor

@amitiuttarwar amitiuttarwar commented Mar 6, 2023

This is joint work with mzumsande.

This is a proposal to diversify outbound connections with respect to reachable networks. The existing logic evaluates peers for connection based purely on the frequency of available addresses in AddrMan. This PR adds logic to automatically connect to alternate reachable networks and adds eviction logic that protects one existing connection to each network.

For instance, if AddrMan is populated primarily with IPv4 and IPv6 addresses and only a handful of onion addresses, it is likely that we won't establish any automatic outbound connections to Tor, even if we're capable of doing so. For smaller networks like CJDNS, this is even more of an issue and often requires adding manual peers to ensure regularly being connected to the network.

Connecting to multiple networks improves resistance to eclipse attacks for individual nodes. It also benefits the entire p2p network by increasing partition resistance and privacy in general.

The automatic connections to alternate networks is done defensively, by first filling all outbound slots with random addresses (as in the status quo) and then adding additional peers from reachable networks the node is currently not connected to. This approach ensures that outbound slots are not left unfilled while attempting to connect to a network that may be unavailable due to a technical issue or misconfiguration that bitcoind cannot detect.

Once an additional peer is added and we have one more outbound connection than we want, outbound eviction ensures that peers are protected if they are the only ones for their network.

Manual connections are also taken into account: If a user already establishes manual connections to a trusted peer from a network, there is no longer a need to make extra efforts to ensure we also have an automatic connection to it (although this may of course happen by random selection).

@DrahtBot
Copy link
Contributor

DrahtBot commented Mar 6, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK vasild, naumenkogs
Concept ACK ghost, 0xB10C, brunoerg, kristapsk, fanquake
Stale ACK mzumsande, ajtowns, willcl-ark

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #28222 (Use shared_ptr for CNode inside CConnman by willcl-ark)
  • #27912 (net: disconnect inside AttemptToEvictConnection by willcl-ark)
  • #27509 (Relay own transactions only via short-lived Tor or I2P connections by vasild)
  • #26114 (net: Make AddrFetch connections to fixed seeds by mzumsande)
  • #25572 (refactor: Introduce EvictionManager and use it for the inbound eviction logic by dergoegge)
  • #25268 (refactor: Introduce EvictionManager by dergoegge)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@amitiuttarwar
Copy link
Contributor Author

amitiuttarwar commented Mar 6, 2023

We have 3 open questions that we are interested in hearing feedback from reviewers. We have considered many potential interactions and have laid out a couple specific ones along with tradeoffs of different approaches.

IBD

The first is regarding the interactions between this patch & IBD. Peers on privacy network might be slower than clearnet peers (eg. Tor or I2P). Our current proposal does not have any special case logic for disabling the network prioritization during IBD.

This could mean that our overall IBD speed is reduced by not having the fastest peers possible. The network peers could also be intermittently disconnected for stalling & get the node into a loop where it, for example, disconnects a Tor peer, connects to an IPV4 peer, re-adds a Tor peer and then keeps repeating. Finally, there could be interactions with the slow peers causing delays in the response time of the faster peers based on the specifics of our IBD logic. One solution is to just disable the network-specific outbound logic during IBD.

On the other hand, a node can get slow peers by random chance anyways. Whether randomly connecting to altnet peers or because a clearnet peer just happens to be slow. If they are slow, but not slow enough to be evicted due to the stalling limit, that will already have a significant impact on the IBD rate. The issues and potential solutions with improving IBD can be seen as tangential to this proposal, and we could favor not adding custom workarounds that serve as a bandaid to mask the root cause. Enabling alternative networks during IBD is already known to slow down the sync dramatically, and we can just add a warning in the documentation to help users become more aware.

Peers protected from eviction

This patch adds eviction protection for special network peers, but it only gets triggered when we have an extra outbound-full-relay peer (aka 9). This occurs when we detect a stale tip or because we added another special network peer. In these situations we’d want to ensure that we have at least one non-protected peer that can be disconnected so the node can return to the steady state of 8 full-relay outbound peers.

Currently, we protect 4 peers via m_chain_sync.m_protect, which are the first peers that send us headers corresponding to our best chain. This means 4 peers could be protected with this logic, and 3 more peers being protected by their network. If we added support for two more alternative networks, a user activating all networks could run into a problem where they have 9 protected peers in these circumstances, and unable to return to the steady state of 8 full-relay outbound peers.

Since it would require adding support for at least one more network to run into this issue, we have not addressed it in this patch. There are multiple viable mitigation strategies: One option is to reduce the number of peers we protect with the headers logic (via m_protect). Another is to limit the network-specific logic to 2 random networks or such.

Network peer protection

A potential concern of this patch is that the logic that protects network peers from eviction could be “over-protective”. For example, keeping a Tor connection only because it is the only Tor connection, not because it’s providing us meaningful data.

Any concerns around this are completely bounded by the fact that the protection logic is limited to when we have an extra peer, either through stale tip detection or through the new temporary connections for network specific peers.

However, if there is still concern, @ajtowns had an interesting idea to increase our likelihood of adding a 2nd peer from alternate networks. We could update the stale tip logic to choose the network of the extra outbound differently. Right now, the network odds reflect the proportions represented in AddrMan. Instead, we could amend the logic to select based on the proportion of networks we are connected to. This would increase our likelihood of connecting to the network, because, for example, you’re more likely to have ⅛ I2P peers than ⅛ of your addrman be I2P addresses.

Increasing the chances of connecting to a privacy network in that context has two benefits:
It serves the original intention of the stale top logic to be a last-ditch effort against a perceived eclipse attack. Overtaking multiple networks increases the cost of attack, so this is theoretically a more useful attempt than clearnet.
For this proposed patch, it would mean the node is more likely to add another peer from the alternate network, so the node is more able to replace a “bad” network peer with hopefully a better one. This would be effective since the stale tip logic tends to run a couple times a day.

This could be treated as a separate chunk of work to be evaluated in a follow-up, or it could be incorporated into this PR if reviewers believe it could help mitigate concerns about edge case possibilities.

@vasild
Copy link
Contributor

vasild commented Mar 7, 2023

Concept ACK.

This was somewhere on my TODO, thanks for picking it up!

The automatic connections to alternate networks is done defensively, by first filling all outbound slots with random addresses (as in the status quo) and then adding additional peers from reachable networks the node is currently not connected to. This approach ensures that outbound slots are not left unfilled while attempting to connect to a network that may be unavailable due to a technical issue or misconfiguration that bitcoind cannot detect.

Hmm, this is maybe ok. Another approach would be to try to open to e.g. Tor while still having < 8 outbound connections. To counter the issue of staying with < 8 connections and not being able to open to Tor because e.g. the Tor proxy is broken, then we could give up after a few attempts and then fall back to the old way of filling with random (aka IPv4 ;-)) addresses. If we overshoot the 8 connections (like described above), then for some time we will have more, until eviction kicks in. How much time is that exactly? If that is short then it would be sub-optimal to open a IPv4 connection and drop it shortly after in cases where Tor connectivity is working smoothly. If it is long then we have practically increased the limit of 8 which will cause more traffic.

In the above, the clearnet networks IPv4 and IPv6 are counted as one network because diversification between these doesn't make sense and is impossible in some network environments.

Why it doesn't make sense? I think it is an extra effort to obtain IPv4 addresses if one has IPv6 addresses and the other way around. Why impossible in some network environments? After all you know if you are going to attempt to connect to 5.6.7.8 or 2030:c0fe:....

Manual connections are also taken into account: If a user already establishes manual connections to a trusted peer from a network, there is no longer a need to make extra efforts

+1

IBD

I think there is no need to add special logic during IBD (e.g. disable this PR during IBD). Clearnet peers can also be slow and non-clearnet peers may be fast-enough. IBD should be able to handle slow/fast peers already without special hand-holding.

Peers protected from eviction

I think we don't need to worry about this now because even if we treat IPv4 and IPv6 as separate networks and all 4 from m_protected are from one network (e.g. IPv4), then we have 4 more slots for IPv6, Tor, I2P, CJDNS. If we implement more networks, maybe by that time we will have increased the default 8 or I would suggest to then start disconnecting from m_protected.

Network peer protection
A potential concern of this patch is that the logic that protects network peers from eviction could be “over-protective”. For example, keeping a Tor connection only because it is the only Tor connection, not because it’s providing us meaningful data.

I don't see it as "over-protective". Can we not have some metric of whether a peer is "providing us a meaningful data"? I am ok to deal with this in a followup.


Taking a step back, the root problem here is that addrman has many IPv4 addresses and few I2P ones (for example). If it would have equal number of addresses from each network, then this problem would not exist because then picking up a random one would have a 1/5 chance of selecting any network (thus there would not be any bias towards IPv4). Would it be simpler and solve the problem equally well if we only change addrman internally to return an address from any given network with a 1/5 chance? Then no need to change the eviction logic or the ThreadOpenConnections logic. And no need to guarantee explicitly in the code that at least one connection exists to each network. It will happen by itself most of the time.

Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! My thoughts on just some of your points:

I think we don't need to worry about this now because even if we treat IPv4 and IPv6 as separate networks and all 4 from m_protected are from one network (e.g. IPv4), then we have 4 more slots for IPv6, Tor, I2P, CJDNS.

I think that scenario would already be problematic: In this case, all 8 outbound peers could be protected, and while there would be no need to add network-related extra peers, we could still add an extra outbound due to stale tip (StartExtraBlockRelayPeers()), and then we wouldn't have a peer to evict.

I don't see it as "over-protective". Can we not have some metric of whether a peer is "providing us a meaningful data"? I am ok to deal with this in a followup.

I agree. We have lots of different metrics for that (e.g. all Misbehaving() calls, ConsiderEviction() for stale tip, inactivity checks) that would apply to network-specific peers in the same way as for unprotected peers. Maybe some of these can be improved, but I think this should be unrelated to this PR.

Would it be simpler and solve the problem equally well if we only change addrman internally to return an address from any given network with a 1/5 chance? Then no need to change the eviction logic or the ThreadOpenConnections logic. And no need to guarantee explicitly in the code that at least one connection exists to each network. It will happen by itself most of the time.

That would certainly be simpler (although I found the necessary non-addrman changes in this PR to be surprisingly small), but as you wrote we'd only have a connection to each network with a certain probability. I did a quick simulation (too lazy to try and do the math), and it appears that with 5 networks and 8 outbounds, the chance to be connected to all of them is just 32% (62% for 4 nets, 88% for 3, 99% for 2).
I would have to think a bit more about the consequence on eclipse attacks for this - for example, a small network like CJDNS is probably pretty easy to eclipse currently, and if we configure IPv4 and CJDNS to be reachable and choose each connection with a base probability of 50% to CJDNS, that could be pretty scary...
But just noting that the two approaches aren't exclusive and implementing both would be possible.

One interesting idea for possible follow-ups would be optional logic that ties certain behavior to networks (e.g. broadcasting wallet transactions only to peers from anonymity networks). In order for that to work, it would be good to have stronger guarantee that we are actually connected to these networks.

@vasild
Copy link
Contributor

vasild commented Mar 8, 2023

I think that scenario would already be problematic ... and then we wouldn't have a peer to evict.

"... then start disconnecting from m_protected"

the math

I think it goes like this: (with 5 networks) the chance of not selecting IPv4 for the first peer is 4/5, for the first and second peer is (4/5)^2. The chance of not selecting IPv4 in all 8 choices is p=(4/5)^8. So, the chance of not selecting IPv4 or not selecting IPv6 is 2*p. The chance of not selecting IPv4, or IPv6 or Tor or I2P or CJDNS is 5*p. In other words, the chance of not being connected to at least one network is 5*p, and thus the chance of being connected to all networks is 1-5*p, or 16% (60% for 4 nets, 88% for 3 nets, 99% for 2 nets and 100% for 1 net).

I think this is enough to ditch the idea of random-chance Select(). Also, maybe it would not be so much more simpler than this PR.

offtopic:

One interesting idea for possible follow-ups would be optional logic that ties certain behavior to networks (e.g. broadcasting wallet transactions only to peers from anonymity networks).

Yes, please! :) To take this one step further, it would be even better to create a temporary connection(s) to Tor/I2P, broadcast our transaction(s) and close the connection(s) (with I2P transient address). This way, if we broadcast another tx after e.g. 1 hour the recipient will not be able to conclude that both transactions come from the same origin.

In order for that to work, it would be good to have stronger guarantee that we are actually connected to these networks.

Would not be needed with temporary tx-broadcast connections.

@mzumsande
Copy link
Contributor

mzumsande commented Mar 8, 2023

I think it goes like this: (...)

I think this is enough to ditch the idea of random-chance Select(). Also, maybe it would not be so much more simpler than this PR.

I agree with that conclusion, but I think you can't just add the probabilities like that. I think the correct formula is
$$p={k! S(n,k) \over k^n}$$
where k is the number of networks, $n=8$ the number of outbounds and $S(n,k)$ the Stirling number of the second kind, see here for an explanation - the resulting 32.256% for all 5 networks agrees very well with my simulation results.
(sorry, this is getting offtopic, and I'll stop now - will address your other open comments soon, unless amiti is first)

@ghost
Copy link

ghost commented Mar 9, 2023

Concept ACK

@0xB10C
Copy link
Contributor

0xB10C commented Mar 12, 2023

Concept ACK: I think, having network diversity in outbound connections is helpful and improves partition resistance.

In the above, the clearnet networks IPv4 and IPv6 are counted as one network because diversification between these doesn't make sense and is impossible in some network environments.

Why it doesn't make sense? I think it is an extra effort to obtain IPv4 addresses if one has IPv6 addresses and the other way around. Why impossible in some network environments? After all you know if you are going to attempt to connect to 5.6.7.8 or 2030:c0fe:....

I also not sure if I agree with counting IPv4 and IPv6 as one network and I think this could need a bit more reasoning. Is it because they are both clearnet? Is it a workaround to don't have 8 protected peers if all networks are enabled (which we'd also face when adding another network like yggdrasil as you mentioned)?

I had the feeling that protecting only one IPv4 or IPv6 could, for example, reduce the number of inbound connections IPv6-only nodes get. However, I'm not sure if that's really the case (I need to think about this more).

@mzumsande
Copy link
Contributor

Hmm, this is maybe ok. Another approach would be to try to open to e.g. Tor while still having < 8 outbound connections. To counter the issue of staying with < 8 connections and not being able to open to Tor because e.g. the Tor proxy is broken, then we could give up after a few attempts and then fall back to the old way of filling with random (aka IPv4 ;-)) addresses. If we overshoot the 8 connections (like described above), then for some time we will have more, until eviction kicks in. How much time is that exactly? If that is short then it would be sub-optimal to open a IPv4 connection and drop it shortly after in cases where Tor connectivity is working smoothly. If it is long then we have practically increased the limit of 8 which will cause more traffic.

We thought about this, of course, because it's the most straightforward approach. However, don't like it as much as the approach here for several reasons:

  • it requires introducing an arbitrary waiting time, during which we might have less outbound connections, especially if this would happen for multiple networks at the same time.
  • having too few outbound connections is a a potential security risk for the node and slows down important processes such as IBD, the only downsides of having an extra outbound connection is to have a bit of extra traffic, plus temporarily requiring more inbound capacity from the network.
  • churning through a few short-lived extra connections has several unrelated benefits, we do it voluntarily anyway (feeler connections), so I'd think it's no big deal.
  • the conditions which precluded us from connecting to a network might correct over time (maybe we had a lot of bad addresses in our addrman in the beginning; maybe some technical problem). Just trying a couple of times right after startup, and then giving up wouldn't be ideal in this scenario.

I guess it boils down to that in general, I think temporarily having more outbound connections than our target is preferable to temporarily having less connections.

@mzumsande
Copy link
Contributor

mzumsande commented Mar 13, 2023

Why it doesn't make sense? I think it is an extra effort to obtain IPv4 addresses if one has IPv6 addresses and the other way around. Why impossible in some network environments? After all you know if you are going to attempt to connect to 5.6.7.8 or 2030:c0fe:....

I also not sure if I agree with counting IPv4 and IPv6 as one network and I think this could need a bit more reasoning. Is it because they are both clearnet? Is it a workaround to don't have 8 protected peers if all networks are enabled (which we'd also face when adding another network like yggdrasil as you mentioned)?

I think our reasoning was:

  1. They are both clearnet, so it's a bit unclear to me if there is much of a benefit in diversification here. Though maybe there is an extra effort for a potential eclipse attacker to cover both IPv4 and IPv6?
  2. IPv4-only network setups exist, I've experienced it more than once that I would be within a network of some organization and could only connect to IPv4 peers, but bitcoind wasn't able to detect that so I'd see regular attempts to make connection to IPv6 peers which all failed immediately.

So basically, we saw no clear upsides, and some downsides, and that's why we decided to add it.
But I'm not insistent on this if people think that there is enough benefit in having connections to both IPv4 and IPv6 peers - it would also make the code cleaner if we didn't have to apply this exception in multiple places.

It had nothing to do with the number of 8 protected peers. I'm sure we could find other solutions around that edge case problem, e.g. adding logic to just not try an extra outbound in the first place when we could get into a position where we don't have an unprotected peer to evict, or softening the protection during eviction as vasild suggested.

@ajtowns
Copy link
Contributor

ajtowns commented Mar 15, 2023

I'm not sure it has any influence on this pr, but did you consider doing something similar for block-relay-only peers?

Something like: connect to one generic block-relay-only peer, plus an additional block-relay-only peer for every network we support? I think that should only improve things, including during IBD. It wouldn't do anything to improve tx propagation of course, so wouldn't make this PR redundant.

@amitiuttarwar
Copy link
Contributor Author

amitiuttarwar commented Mar 19, 2023

thank you everyone for the thoughtful feedback! some questions & responses -

RE: logic to initially overshoot number of connections

@vasild

If we overshoot the 8 connections (like described above), then for some time we will have more, until eviction kicks in. How much time is that exactly? If that is short then it would be sub-optimal to open a IPv4 connection and drop it shortly after in cases where Tor connectivity is working smoothly. If it is long then we have practically increased the limit of 8 which will cause more traffic.

Since we already have short-lived connections through feelers, rotating extra block relay only connections, and stale tip detection, can you clarify the disadvantage of adding another? Especially since we wouldn’t expect this new mechanism to occur frequently, does it seem acceptable or intrinsically different somehow?

RE: ipv4 vs ipv6

@vasild: Are you suggesting that treating them as separate networks increases the cost of attack? Does the delta in energy required from the attacker feel meaningful?

@0xB10C

I had the feeling that protecting only one IPv4 or IPv6 could, for example, reduce the number of inbound connections IPv6-only nodes get. However, I’m not sure if that’s really the case (I need to think about this more).

Hm interesting.
Our current baseline is that we have no protections based on network.
By introducing network protections, we would expect the number of inbounds for IPV4 and IPV6 to drop compared to the current conditions.
It’s likely that conflating the two networks would impact the distribution between them, but would that be problematic?

Future improvements

Agreed these are orthogonal to the current proposal, but are very interesting ideas worth further exploration!
- temporary tx broadcast connections
- block-relay-only connections per network

Copy link
Contributor

@brunoerg brunoerg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

@brunoerg
Copy link
Contributor

brunoerg commented Mar 21, 2023

Something like: connect to one generic block-relay-only peer, plus an additional block-relay-only peer for every network we support? I think that should only improve things, including during IBD. It wouldn't do anything to improve tx propagation of course, so wouldn't make this PR redundant.

Also, should we have (at least) one anchor per network instead of only two "generic" ones?

@amitiuttarwar
Copy link
Contributor Author

added a commit that increases fuzz coverage for select by network

@brunoerg

Also, should we have (at least) one anchor per network instead of only two "generic" ones?

yeah, def think it's worth considering in relation to increasing block-relay-peers. @mzumsande & I have started brainstorming & plan to open an issue once we formulate some initial ideas. in the meanwhile, feel free to reach out directly if you're interested in discussing further :)

@vasild
Copy link
Contributor

vasild commented Apr 7, 2023

If we overshoot the 8 connections (like described above), then for some time we will have more, until eviction kicks in. How much time is that exactly? If that is short then it would be sub-optimal to open a IPv4 connection and drop it shortly after in cases where Tor connectivity is working smoothly. If it is long then we have practically increased the limit of 8 which will cause more traffic.

Since we already have short-lived connections through feelers, rotating extra block relay only connections, and stale tip detection, can you clarify the disadvantage of adding another? Especially since we wouldn’t expect this new mechanism to occur frequently, does it seem acceptable or intrinsically different somehow?

You did not answer my question "How much time is that exactly?" but your reply seems to imply an answer "short", correct me if I misunderstand. The disadvantage of doing that is "it would be sub-optimal to open a IPv4 connection and drop it shortly after in cases where Tor connectivity is working smoothly".

@vasild: Are you suggesting that treating them as separate networks increases the cost of attack?

Yes.

Does the delta in energy required from the attacker feel meaningful?

That question is a bit vague. I guess "yes" for some values of "meaningful".

achow101 added a commit that referenced this pull request Apr 20, 2023
17e7054 doc: clarify new_only param for Select function (Amiti Uttarwar)
b0010c8 bench: test select for a new table with only one address (Amiti Uttarwar)
9b91aae bench: add coverage for addrman select with network parameter (Amiti Uttarwar)
22a4d14 test: increase coverage of addrman select (without network) (Amiti Uttarwar)
a98e542 test: add addrman test for special case (Amiti Uttarwar)
5c8b4ba tests: add addrman_select_by_network test (Amiti Uttarwar)
6b22928 addrman: add functionality to select by network (Amiti Uttarwar)
26c3bf1 scripted-diff: rename local variables to match modern conventions (Amiti Uttarwar)
4880641 refactor: consolidate select logic for new and tried tables (Amiti Uttarwar)
ca2a9c5 refactor: generalize select logic (Amiti Uttarwar)
052fbcd addrman: Introduce helper to generalize looking up an addrman entry (Amiti Uttarwar)
9bf078f refactor: update Select_ function (Amiti Uttarwar)

Pull request description:

  For the full context & motivation of this patch, see #27213

  This is joint work with mzumsande.

  This PR adds functionality to `AddrMan::Select` to enable callers to specify a network they are interested in.

  Along the way, it refactors the function to deduplicate the logic, updates the local variables to match modern conventions, adds test coverage for both the new and existing `Select` logic, and adds bench tests for the worst case performance of both the new and existing `Select` logic.

  This functionality is used in the parent PR.

ACKs for top commit:
  vasild:
    ACK 17e7054
  brunoerg:
    re-ACK 17e7054
  ajtowns:
    ACK 17e7054
  mzumsande:
    Code Review ACK 17e7054

Tree-SHA512: e99d1ce0c44a15601a3daa37deeadfc9d26208a92969ecffbea358d57ca951102d759734ccf77eacd38db368da0bf5b6fede3cd900d8a77b3061f4adc54e52d8
@amitiuttarwar amitiuttarwar force-pushed the 2023-03-network-outbounds branch from 7471e55 to 8f91e58 Compare April 21, 2023 03:38
Connman uses this new map to keep a count of active OUTBOUND_FULL_RELAY and
MANUAL connections. Unused until next commit.

Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
@amitiuttarwar amitiuttarwar force-pushed the 2023-03-network-outbounds branch from d8b3d90 to be73855 Compare August 3, 2023 18:49
@amitiuttarwar
Copy link
Contributor Author

amitiuttarwar commented Aug 3, 2023

thanks for the reviews & feedback @naumenkogs, @vasild, @willcl-ark & @fanquake

I've updated this PR to incorporate all the open feedback & fix up the commit messages. everything should be addressed except for these two one open item:
1. question about locking improvement #27213 (comment) (updated using suggestion from #27213 (comment))
2. release note which is kept in the follow-up, to prevent possible review churn on language feedback

should be ready for re-review. thanks!

mzumsande and others added 3 commits August 3, 2023 19:27
If a peer is the only one of its network, protect it from eviction.
This improves the diversity of outbound connections with respect to
reachable networks.

Co-authored-by: Amiti Uttarwar <amiti@uttarwar.org>
Co-authored-by: Amiti Uttarwar <amiti@uttarwar.org>
Diversify outbound connections with respect to
networks: Every ~5 minutes, try to add an extra connection
to a reachable network which we currently don't have a connection to.
This is done defensively - only try management with respect to networks
after all existing outbound slots are filled.
The resulting situation with an extra outbound peer will be handled
by the extra outbound eviction logic, which protects peers from
eviction if they are the only ones for their network.

Co-authored-by: Amiti Uttarwar <amiti@uttarwar.org>
@amitiuttarwar amitiuttarwar force-pushed the 2023-03-network-outbounds branch from be73855 to 1b52d16 Compare August 4, 2023 03:25
Copy link
Contributor

@vasild vasild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 1b52d16

@willcl-ark
Copy link
Member

ACK 1e65aae

Updates shown by git range-diff 1e65aae806...1b52d16 look correct to me and glad to get the nits in this initial pull.

@naumenkogs
Copy link
Member

ACK 1b52d16

@DrahtBot DrahtBot removed the request for review from naumenkogs August 4, 2023 08:42
@bitcoin bitcoin deleted a comment from Malk8628 Aug 6, 2023
@fanquake
Copy link
Member

fanquake commented Aug 6, 2023

@willcl-ark note that you ACK'd the wrong commit here.

@fanquake fanquake merged commit b713825 into bitcoin:master Aug 6, 2023
fanquake added a commit that referenced this pull request Aug 9, 2023
7463d25 doc: Add release note (Amiti Uttarwar)

Pull request description:

  release notes for #27213

ACKs for top commit:
  mzumsande:
    ACK 7463d25

Tree-SHA512: 16c479774ed9242d8d044d08cc919550ccd07020423a3dcd99f07dad36e4dafd8243dc47f9f7f0c8eedcb53efd85ec65afedba56422452f637d313ec7c901520
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Aug 9, 2023
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Aug 9, 2023
7463d25 doc: Add release note (Amiti Uttarwar)

Pull request description:

  release notes for bitcoin#27213

ACKs for top commit:
  mzumsande:
    ACK 7463d25

Tree-SHA512: 16c479774ed9242d8d044d08cc919550ccd07020423a3dcd99f07dad36e4dafd8243dc47f9f7f0c8eedcb53efd85ec65afedba56422452f637d313ec7c901520
Copy link
Member

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Post-merge ACK.

The last commit, p2p: network-specific management of outbound connections, looks like it could use test coverage.

I've been testing the changes here these past two days, with added/improved logging to monitor the new behavior.

@@ -1895,6 +1939,9 @@ void CConnman::ThreadOpenConnections(const std::vector<std::string> connect)
}
LogPrint(BCLog::NET, "Making feeler connection to %s\n", addrConnect.ToStringAddrPort());
}

if (preferred_net != std::nullopt) LogPrint(BCLog::NET, "Making network specific connection to %s on %s.\n", addrConnect.ToStringAddrPort(), GetNetworkName(preferred_net.value()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestions for this log entry:

  • use LogPrintLevel

  • replace "Making" with "Trying to make" (like the anchor connection logging), as connections to privacy networks regularly fail

  • add missing hyphen to "network specific" and log the network just after it, instead of at the end where it is less useful (and a bit less clear, e.g. "on onion")

  • log the IP address only if the -logips config option is set

  • add the connection type

possible diff

-
-            if (preferred_net != std::nullopt) LogPrint(BCLog::NET, "Making network specific connection to %s on %s.\n", addrConnect.ToStringAddrPort(), GetNetworkName(preferred_net.value()));
+            if (preferred_net.has_value()) {
+                LogPrintLevel(BCLog::NET, BCLog::Level::Debug, "Trying to make a network-specific %s connection %s(%s)\n",
+                              GetNetworkName(preferred_net.value()),
+                              fLogIPs ? strprintf("to %s ", addrConnect.ToStringAddrPort()) : "",
+                              ConnectionTypeAsString(ConnectionType::OUTBOUND_FULL_RELAY));
+            }

Example output:

[net:debug] Trying to make a network-specific cjdns connection to [fc70:de9d:7fe2:b32:5828:1a3c:d0f:83ec]:8333 (outbound-full-relay)

@@ -5159,6 +5161,9 @@ void PeerManagerImpl::EvictExtraOutboundPeers(std::chrono::seconds now)
if (state == nullptr) return; // shouldn't be possible, but just in case
// Don't evict our protected peers
if (state->m_chain_sync.m_protect) return;
// If this is the only connection on a particular network that is
// OUTBOUND_FULL_RELAY or MANUAL, protect it.
if (!m_connman.MultipleManualOrFullOutboundConns(pnode->addr.GetNetwork())) return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We log the disconnection and protection actions in EvictExtraOutboundPeers() and it would be good to log this action as well.

While testing and monitoring the changes in this pull, I found the following diff to be helpful.

-            if (!m_connman.MultipleManualOrFullOutboundConns(pnode->addr.GetNetwork())) return;
+            if (!m_connman.MultipleManualOrFullOutboundConns(pnode->addr.GetNetwork())) {
+                LogPrintLevel(BCLog::NET, BCLog::Level::Debug,
+                              "Protecting from eviction last remaining outbound %s peer=%d that relays transactions%s (%s)\n",
+                              GetNetworkName(pnode->addr.GetNetwork()), pnode->GetId(),
+                              fLogIPs ? strprintf(": %s", pnode->addr.ToStringAddrPort()) : "",
+                              pnode->ConnectionTypeAsString());
+                return;
+            }
2023-08-09T18:52:42.433930Z [net:debug] Trying to make a network-specific cjdns connection to [fc70:de9d:7fe2:b32:5828:1a3c:d0f:83ec]:8333 (outbound-full-relay)
2023-08-09T18:53:16.324002Z [net:debug] Protecting from eviction last remaining outbound cjdns peer=26 that relays transactions: [fc70:de9d:7fe2:b32:5828:1a3c:d0f:83ec]:8333 (outbound-full-relay)

} // no default case, so the compiler can warn about missing cases

assert(false);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can more simply use the existing helpers.

bool IsManualOrFullOutboundConn() const { return IsFullOutboundConn() || IsManualConn(); }

@@ -1795,6 +1825,17 @@ void CConnman::ThreadOpenConnections(const std::vector<std::string> connect)
next_feeler = GetExponentialRand(now, FEELER_INTERVAL);
conn_type = ConnectionType::FEELER;
fFeeler = true;
} else if (nOutboundFullRelay == m_max_outbound_full_relay &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first line of this conditional seems redundant, as it is already checked a few lines earlier for the OUTBOUND_FULL_RELAY case.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.