p2p: do not make automatic outbound connections to addnode peers #28895

jonatack · 2023-11-16T16:44:14Z

to allocate our limited outbound slots correctly, and to ensure addnode
connections benefit from their intended protections.

Our addnode logic usually connects the addnode peers before the automatic
outbound logic does, but not always, as a connection race can occur. If an
addnode peer disconnects us and if it was the only one from its network, there
can be a race between reconnecting to it with the addnode thread, and it being
picked as automatic network-specific outbound peer. Or our internet connection
or router or the addnode peer could be temporarily offline, and then return
online during the automatic outbound thread. Or we could add a new manual peer
using the addnode RPC at that time.

The race can be more apparent when our node doesn't know many peers, or with
networks like cjdns that currently have few bitcoin peers.

When an addnode peer is connected as an automatic outbound peer and is the only
connection we have to a network, it can be protected by our new outbound
eviction logic and persist in the "wrong role".

Finally, there does not seem to be a reason to make block-relay or short-lived
feeler connections to addnode peers, as the addnode logic will ensure we connect
to them if they are up, within the addnode connection limit.

Fix these issues by checking if the address is an addnode peer in our automatic
outbound connection logic.

to allocate our limited outbound slots correctly, and to ensure addnode connections benefit from their intended protections. Our addnode logic usually connects the addnode peers before the automatic outbound logic does, but not always, as a connection race can occur. If an addnode peer disconnects us and if it was the only one from its network, there can be a race between reconnecting to it with the addnode thread, and it being picked as automatic network-specific outbound peer. Or our internet connection or router, or the addnode peer, could be temporarily offline, and then return online during the automatic outbound thread. Or we could add a new manual peer using the addnode RPC at that time. The race can be more apparent when our node doesn't know many peers, or with networks like cjdns that currently have few bitcoin peers. When an addnode peer is connected as an automatic outbound peer and is the only connection we have to a network, it can be protected by our new outbound eviction logic and persist in the "wrong role". Examples on mainnet using logging added in the same pull request: 2023-08-12T14:51:05.681743Z [opencon] [net.cpp:1949] [ThreadOpenConnections] [net:debug] Not making automatic network-specific outbound-full-relay connection to i2p peer selected for manual (addnode) connection: [geh...odq.b32.i2p]:0 2023-08-13T03:59:28.050853Z [opencon] [net.cpp:1949] [ThreadOpenConnections] [net:debug] Not making automatic block-relay-only connection to onion peer selected for manual (addnode) connection: kpg...aid.onion:8333 2023-08-13T16:21:26.979052Z [opencon] [net.cpp:1949] [ThreadOpenConnections] [net:debug] Not making automatic network-specific outbound-full-relay connection to cjdns peer selected for manual (addnode) connection: [fcc...8ce]:8333 2023-08-14T20:43:53.401271Z [opencon] [net.cpp:1949] [ThreadOpenConnections] [net:debug] Not making automatic network-specific outbound-full-relay connection to cjdns peer selected for manual (addnode) connection: [fc7...59e]:8333 2023-08-15T00:10:01.894147Z [opencon] [net.cpp:1949] [ThreadOpenConnections] [net:debug] Not making automatic feeler connection to i2p peer selected for manual (addnode) connection: geh...odq.b32.i2p:8333 Finally, there does not seem to be a reason to make block-relay or short-lived feeler connections to addnode peers, as the addnode logic will ensure we connect to them if they are up, within the addnode connection limit. Fix these issues by checking if the address is an addnode peer in our automatic outbound connection logic.

DrahtBot · 2023-11-16T16:44:16Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	mzumsande, vasild, guggero, brunoerg

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

jonatack · 2023-11-16T16:49:13Z

Reported in https://github.com/bitcoin/bitcoin/pull/27213/files#r1291926369 and extracted from an earlier version of #28248.

The first commit ought to be feasible to backport, if desired. The second test commit would require #28155.

dergoegge · 2023-11-16T17:17:24Z

Thank you for isolating this change and making it more clear what you were going for.

From what I understand the "regression" is not that we might make automatic connections to addnode peers but rather that if that happens these connections might stick around as automatic connections for longer (due to the new eviction logic), while not being labeled as manual.

So in the end a user might end up with less outbound connections than expected for longer than is currently possible, with some of their manual connections incorrectly labeled as automatic.

If the above is correct, then I am not sure if this is worthy of a backport at this stage in the release cycle because: 1) This should be rare and 2) the manual connections are still being made and persisted (just labeled incorrectly).

jonatack · 2023-11-16T17:23:11Z

Empirically, the regression is that we're actively adding addnode peers as outbound ones, and keeping them there. Since a couple of years I run a node with all three of our privacy networks and addnode entries for each, along with clearnet, and I didn't see my addnode peers being connected to as non-manual outbounds. This is new behavior for my node, it's frequent, it removes the intended privileges that addnode entries benefit from, and it impacts precisely the nodes running multiple networks including ones with fewer peers that used addnode entries to ensure connection to them.

dergoegge · 2023-11-16T18:05:59Z

Empirically, the regression is that we're actively adding addnode peers as outbound ones, and keeping them there.

Sure but logically, the regression is that the duration of these connections increased not that they are being made in the first place.

I'm wondering why you only now noticed this, since the duration of these mislabeled connections prior to the new eviction logic (i assume) depended on regular outbound eviction behavior, which could also result in longer living connections.

jonatack · 2023-11-16T18:30:50Z

From what I understand the "regression" is not that we might make automatic connections to addnode peers but rather that if that happens these connections might stick around as automatic connections for longer (due to the new eviction logic), while not being labeled as manual.

This stems from a change in this release where it was observed and reported. We did not previously automatically add outbound connections to addnode entries in sparse networks to the degree and regularity that we do now.

So in the end a user might end up with less outbound connections than expected for longer than is currently possible, with some of their manual connections incorrectly labeled as automatic.

Sure but logically, the regression is that the duration of these connections increased not that they are being made in the first place.

This should be rare and 2) the manual connections are still being made and persisted (just labeled incorrectly).

No longer rare, and not only about mislabeled connections. The new change is also misallocating rare outbound connection slots, and attributing incorrect peer privileges as addnode peers benefit from non-eviction, are not disconnected or discouraged for bad behavior, etc.

I'm wondering why you only now noticed this, since the duration of these mislabeled connections prior to the new eviction logic (i assume) depended on regular outbound eviction behavior, which could also result in longer living connections.

Observed and reported in August on the pull that made the change (there was no reaction). While it looks like it could occur somewhat rarely before, with the change it is now frequent, and more noticeable as the connections are protected.

mzumsande · 2023-11-16T21:01:45Z

Tested ACK 5e7cc41

I tested this by running on cjdns and ipv4, and then manually disconnecting an addnode cjdns peer and observing the log - both on v26.0rc2 and this branch.

Sure but logically, the regression is that the duration of these connections increased not that they are being made in the first place.

I think I disagree with that. They might now be protected from eviction due to network-specific extra peers, but that eviction is also new: Before #27213, we only had the stale-tip eviction, and I think that it would only extremely rarely evict such a peer because of the way it works (see this post for why I think so), so once an automatic connections was made, it was probably never evicted unless there were other causes like we got offline or they disconnected us.

I think that the mechanism mentioned in the OP (race after getting disconnected) is probably what makes this most noticeable, because ThreadOpenAddedConnections() has a long sleep (60s) which it wouldn't interrupt, and even though EXTRA_NETWORK_PEER_INTERVAL = 5min is larger, it would immediately kick in after the only peer from a network disconnected us (and then the next time ~5 minutes later), so ThreadOpenConnection would usually be quicker than ThreadOpenAddedConnections().

A completely unrelated problem is that this problem exists because cjdns has so few peers <10? on mainnet, which is a shame considering how much engineering effort went into it, but I'm not sure what to do about that.

jonatack · 2023-11-16T21:16:11Z

cjdns has so few peers <10? on mainnet, which is a shame considering how much engineering effort went into it, but I'm not sure what to do about that.

I also saw the network-specific issue with i2p peers, and when adding a new network, though cjdns is indeed a good way to test it. As to what we can do to further cjdns adoption, when one-click I2P support was added to Raspiblitz and Umbrel in December 2022 after this workshop and blog post, the number of recent peers returned by -addrinfo for me went from a couple hundred to quickly more than a thousand, and now a couple thousand. Cjdns being a friend-of-a-friend topology requires the additional step of finding a friend to connect to the network with, but we have a couple of stable ones in the hardcoded DNS seeds, and maybe I can encourage and work with the node-in-a-box packages to get support for it in (and do more related workshops).

src/test/net_peer_connection_tests.cpp

vasild

ACK 5e7cc41

vasild · 2023-11-17T13:40:06Z

src/net.cpp

+    return (m_added_node_params.size() < 24 // bound the query to a reasonable limit
+            && std::any_of(m_added_node_params.cbegin(), m_added_node_params.cend(),
+                           [&](const auto& p) { return p.m_added_node == addr_str || p.m_added_node == addr_port_str; }));


(hit "submit" too early)

nit, non-blocker: I do not think that this < 24 cap is needed. What is the max number of added nodes expected?

I think it is not worth doing that, but in theory, if this is to be optimized - then we can observe that addr_port_str begins with addr_str. For example aaa:8333 and aaa and if we are checking if string aaaa is equal to either one, the way this is done now is that it would traverse the entire aaaa twice whereas it can be done with a single traverse. But please ignore this :)

I think the number of added nodes is up to the user without a limit, though 8 at a time is the addnode connection limit. We can change m_added_node_params to an unordered set later on as you suggested in #28248 to make the query constant-time on average and drop the size check. Kept it simple here.

guggero

utACK 5e7cc41

I think it's important that nodes added by addnode always get the elevated protection level.

A completely unrelated problem is that this problem exists because cjdns has so few peers <10? on mainnet, which is a shame considering how much engineering effort went into it, but I'm not sure what to do about that.

I'm still trying to wrap my head around CJDNS and how it can be set up in a docker based environment. If successful, I'll try creating an Umbrel app for it, which should hopefully help with growing the network.

brunoerg · 2023-11-17T16:36:45Z

utACK 5e7cc41

fanquake · 2023-11-22T11:57:36Z

Added the fix to #28872 for backporting.

to allocate our limited outbound slots correctly, and to ensure addnode connections benefit from their intended protections. Our addnode logic usually connects the addnode peers before the automatic outbound logic does, but not always, as a connection race can occur. If an addnode peer disconnects us and if it was the only one from its network, there can be a race between reconnecting to it with the addnode thread, and it being picked as automatic network-specific outbound peer. Or our internet connection or router, or the addnode peer, could be temporarily offline, and then return online during the automatic outbound thread. Or we could add a new manual peer using the addnode RPC at that time. The race can be more apparent when our node doesn't know many peers, or with networks like cjdns that currently have few bitcoin peers. When an addnode peer is connected as an automatic outbound peer and is the only connection we have to a network, it can be protected by our new outbound eviction logic and persist in the "wrong role". Examples on mainnet using logging added in the same pull request: 2023-08-12T14:51:05.681743Z [opencon] [net.cpp:1949] [ThreadOpenConnections] [net:debug] Not making automatic network-specific outbound-full-relay connection to i2p peer selected for manual (addnode) connection: [geh...odq.b32.i2p]:0 2023-08-13T03:59:28.050853Z [opencon] [net.cpp:1949] [ThreadOpenConnections] [net:debug] Not making automatic block-relay-only connection to onion peer selected for manual (addnode) connection: kpg...aid.onion:8333 2023-08-13T16:21:26.979052Z [opencon] [net.cpp:1949] [ThreadOpenConnections] [net:debug] Not making automatic network-specific outbound-full-relay connection to cjdns peer selected for manual (addnode) connection: [fcc...8ce]:8333 2023-08-14T20:43:53.401271Z [opencon] [net.cpp:1949] [ThreadOpenConnections] [net:debug] Not making automatic network-specific outbound-full-relay connection to cjdns peer selected for manual (addnode) connection: [fc7...59e]:8333 2023-08-15T00:10:01.894147Z [opencon] [net.cpp:1949] [ThreadOpenConnections] [net:debug] Not making automatic feeler connection to i2p peer selected for manual (addnode) connection: geh...odq.b32.i2p:8333 Finally, there does not seem to be a reason to make block-relay or short-lived feeler connections to addnode peers, as the addnode logic will ensure we connect to them if they are up, within the addnode connection limit. Fix these issues by checking if the address is an addnode peer in our automatic outbound connection logic. Github-Pull: bitcoin#28895 Rebased-From: cc62716

2f86d30 doc: update release notes for v26.0rc3 (fanquake) 3b6c7f2 doc: update manual pages for v26.0rc3 (fanquake) 3db4d1c build: bump version to v26.0rc3 (fanquake) 6045f38 build: Fix regression in "ARMv8 CRC32 intrinsics" test (Hennadii Stepanov) 5eaa179 ci: Avoid toolset ambiguity that MSVC can't handle (Hennadii Stepanov) 55af112 p2p: do not make automatic outbound connections to addnode peers (Jon Atack) 5e0bcc1 ci: Switch from `apt` to `apt-get` (Hennadii Stepanov) 437a531 ci: Update apt cache (Hennadii Stepanov) 1488648 pool: change memusage_test to use int64_t, add allocation check (Martin Leitner-Ankerl) bcc183c pool: make sure PoolAllocator uses the correct alignment (Martin Leitner-Ankerl) 7dda499 doc: regenerate example bitcoin.conf (fanquake) 5845331 doc: rewrite explanation for -par= (fanquake) Pull request description: Currently backports: * #28858 * #28895 (partial) * #28913 * #28905 * #28919 * #28925 Also includes changes for rc3, and reintegrating the release-notes. ACKs for top commit: hebasto: re-ACK 2f86d30, only #28919 backported since my [recent](#28872 (review)) review. TheCharlatan: ACK 2f86d30 Tree-SHA512: 43c91b344d37f582081ac184ac59cf76c741317b2b69a24fcd4287eefa8333e20c545e150798f4057d6f4ac8e70ed9cba1c8dd9777b11c1cf8992cce09108727

…dnode peers

, bitcoin#27213, bitcoin#28189, bitcoin#28155, bitcoin#28895, partial bitcoin#26396 (networking backports: part 9) 09504bd merge bitcoin#28895: do not make automatic outbound connections to addnode peers (Kittywhiskers Van Gogh) 6cf206c merge bitcoin#28155: improves addnode / m_added_nodes logic (Kittywhiskers Van Gogh) 11d654a merge bitcoin#28189: diversify network outbounds release note (Kittywhiskers Van Gogh) 5dc52b3 merge bitcoin#27213: Diversify automatic outbound connections with respect to networks (Kittywhiskers Van Gogh) 291305b merge bitcoin#26497: Make ConsumeNetAddr always produce valid onion addresses (Kittywhiskers Van Gogh) 6a37934 partial bitcoin#26396: Avoid SetTxRelay for feeler connections (Kittywhiskers Van Gogh) 221a78e merge bitcoin#25156: Introduce PeerManagerImpl::RejectIncomingTxs (Kittywhiskers Van Gogh) cc694c2 merge bitcoin#22778: Reduce resource usage for inbound block-relay-only connections (Kittywhiskers Van Gogh) 6e6de54 net: use `Peer::m_can_relay_tx` when we mean `m_tx_relay != nullptr` (Kittywhiskers Van Gogh) 77526d2 net: rename `Peer::m_block_relay_only` to `Peer::m_can_tx_relay` (Kittywhiskers Van Gogh) Pull request description: ## Additional Information * Dependency for #6333 * When backporting [bitcoin#22778](bitcoin#22778), the `m_tx_inventory_known_filter.insert()` call in `queueAndMaybePushInv()` had to be moved out as `EXCLUSIVE_LOCKS_REQUIRED` does not seem to work with lambda parameters list (though it does work with capture list, [source](https://github.com/dashpay/dash/blob/4b5e39290c8f1c78305e597312408c84e2b06b64/src/net_processing.cpp#L5781)), see error below <details> <summary>Compiler error:</summary> ``` net_processing.cpp:5895:21: note: found near match 'tx_relay->m_tx_inventory_mutex' net_processing.cpp:5955:21: error: calling function 'operator()' requires holding mutex 'queueAndMaybePushInv.m_tx_inventory_mutex' exclusively [-Werror,-Wthread-safety-precise] queueAndMaybePushInv(tx_relay, CInv(nInvType, hash)); ^ net_processing.cpp:5955:21: note: found near match 'tx_relay->m_tx_inventory_mutex' net_processing.cpp:5977:17: error: calling function 'operator()' requires holding mutex 'queueAndMaybePushInv.m_tx_inventory_mutex' exclusively [-Werror,-Wthread-safety-precise] queueAndMaybePushInv(inv_relay, inv); ^ ``` </details> * Attempting to remove the `EXCLUSIVE_LOCKS_REQUIRED` or the `AssertLockHeld` are not options due to stricter thread sanitization checks being applied since [dash#6319](#6319) (and in general, removing annotations being inadvisable regardless) * We cannot simply lock `peer->GetInvRelay()->m_tx_inventory_mutex` as a) the caller already has a copy of the relay pointer (which means that `m_tx_relay_mutex` was already held) that b) they used to lock `m_tx_inventory_mutex` (which we were already asserting) so c) we cannot simply re-lock `m_tx_inventory_mutex` as it's already already held, just through a less circuitous path. * The reason locking is mentioned instead of asserting is that the compiler treats `peer->GetInvRelay()->m_tx_inventory_mutex` and `tx_relay->m_tx_inventory_mutex` as separate locks (despite being different ways of accessing the same thing) and would complain similarly to the error above if attempting to assert the former while holding the latter. * As `m_tx_relay` is always initialized for Dash, to mimic the behaviour _expected_ upstream, in [bitcoin#22778](bitcoin#22778), `SetTxRelay()` will _allow_ `GetTxRelay()` to return `m_can_tx_relay` (while Dash code that demands unconditional access can use `GetInvRelay()` instead) with the lack of `SetTxRelay()` resulting in `GetTxRelay()` feigning the absence of `m_can_tx_relay`. This allows us to retain the same style of checks used upstream instead of using proxies like `!CNode::IsBlockOnlyConn()` to determined if we _should_ use `m_tx_relay`. * Speaking of proxies, being a block-only connection is now no longer the only reason not to relay transactions In preparation for [bitcoin#26396](bitcoin#26396), proxy usage of `!CNode::IsBlockOnlyConn()` and `!Peer::m_block_relay_only` was replaced with the more explicit `Peer::m_can_tx_relay` before being used to gate the result of `GetTxRelay()`, to help it mimic upstream semantics. ## Breaking Changes None expected ## Checklist - [x] I have performed a self-review of my own code - [x] I have commented my code, particularly in hard-to-understand areas **(note: N/A)** - [x] I have added or updated relevant unit/integration/functional/e2e tests - [x] I have made corresponding changes to the documentation - [x] I have assigned this pull request to a milestone _(for repository code-owners and collaborators only)_ ACKs for top commit: UdjinM6: utACK 09504bd Tree-SHA512: f4f36f12f749b697dd4ad5521ed15f862c93ed4492a047759554aa80a3ce00dbd1bdc0242f7a4468f41f25925d5b79c8ab774d8489317437b1983f0a1277eecb

jonatack added 2 commits November 16, 2023 10:38

test: add unit test for CConnman::AddedNodesContain()

5e7cc41

DrahtBot added the P2P label Nov 16, 2023

jonatack mentioned this pull request Nov 16, 2023

p2p: Diversify automatic outbound connections with respect to networks #27213

Merged

jonatack commented Nov 16, 2023

View reviewed changes

src/test/net_peer_connection_tests.cpp Show resolved Hide resolved

vasild approved these changes Nov 17, 2023

View reviewed changes

vasild reviewed Nov 17, 2023

View reviewed changes

guggero approved these changes Nov 17, 2023

View reviewed changes

DrahtBot requested review from brunoerg and removed request for brunoerg November 17, 2023 16:36

jonatack mentioned this pull request Nov 20, 2023

[26.x] Changes for rc3 #28872

Merged

fanquake merged commit 4374a87 into bitcoin:master Nov 22, 2023

jonatack deleted the 2023-11-do-not-make-automatic-outbound-connection-to-addnode-peers branch November 22, 2023 18:55

mzumsande mentioned this pull request Dec 18, 2023

net: Make AddrFetch connections to fixed seeds #26114

Closed

mzumsande mentioned this pull request Jan 26, 2024

p2p: Allow whitelisting manual connections #27114

Merged

kwvg added a commit to kwvg/dash that referenced this pull request Oct 26, 2024

merge bitcoin#28895: do not make automatic outbound connections to ad…

909e293

…dnode peers

kwvg added a commit to kwvg/dash that referenced this pull request Oct 26, 2024

merge bitcoin#28895: do not make automatic outbound connections to ad…

ffe8a36

…dnode peers

kwvg added a commit to kwvg/dash that referenced this pull request Oct 26, 2024

merge bitcoin#28895: do not make automatic outbound connections to ad…

098a567

…dnode peers

kwvg added a commit to kwvg/dash that referenced this pull request Oct 26, 2024

merge bitcoin#28895: do not make automatic outbound connections to ad…

09504bd

…dnode peers

bitcoin locked and limited conversation to collaborators Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

p2p: do not make automatic outbound connections to addnode peers #28895

p2p: do not make automatic outbound connections to addnode peers #28895

Uh oh!

jonatack commented Nov 16, 2023

Uh oh!

DrahtBot commented Nov 16, 2023 •

edited

Loading

Uh oh!

jonatack commented Nov 16, 2023

Uh oh!

dergoegge commented Nov 16, 2023

Uh oh!

jonatack commented Nov 16, 2023

Uh oh!

dergoegge commented Nov 16, 2023

Uh oh!

jonatack commented Nov 16, 2023 •

edited

Loading

Uh oh!

mzumsande commented Nov 16, 2023 •

edited

Loading

Uh oh!

jonatack commented Nov 16, 2023 •

edited

Loading

Uh oh!

Uh oh!

vasild left a comment

Uh oh!

vasild Nov 17, 2023

Uh oh!

jonatack Nov 17, 2023 •

edited

Loading

Uh oh!

guggero left a comment

Uh oh!

brunoerg commented Nov 17, 2023 •

edited

Loading

Uh oh!

fanquake commented Nov 22, 2023

Uh oh!

Uh oh!

p2p: do not make automatic outbound connections to addnode peers #28895

p2p: do not make automatic outbound connections to addnode peers #28895

Uh oh!

Conversation

jonatack commented Nov 16, 2023

Uh oh!

DrahtBot commented Nov 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage

Reviews

Uh oh!

jonatack commented Nov 16, 2023

Uh oh!

dergoegge commented Nov 16, 2023

Uh oh!

jonatack commented Nov 16, 2023

Uh oh!

dergoegge commented Nov 16, 2023

Uh oh!

jonatack commented Nov 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mzumsande commented Nov 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jonatack commented Nov 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

vasild left a comment

Choose a reason for hiding this comment

Uh oh!

vasild Nov 17, 2023

Choose a reason for hiding this comment

Uh oh!

jonatack Nov 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guggero left a comment

Choose a reason for hiding this comment

Uh oh!

brunoerg commented Nov 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fanquake commented Nov 22, 2023

Uh oh!

Uh oh!

DrahtBot commented Nov 16, 2023 •

edited

Loading

jonatack commented Nov 16, 2023 •

edited

Loading

mzumsande commented Nov 16, 2023 •

edited

Loading

jonatack commented Nov 16, 2023 •

edited

Loading

jonatack Nov 17, 2023 •

edited

Loading

brunoerg commented Nov 17, 2023 •

edited

Loading