Skip to content

Conversation

mzumsande
Copy link
Contributor

@mzumsande mzumsande commented Sep 28, 2022

This picks up the last commit from #19843.

Previously, we would prepare to self-announce to a new peer while parsing a version message from that peer.
This is redundant, because we do something very similar in MaybeSendAddr(), which is called from SendMessages() after
the version handshake is finished.

There are a couple of differences:

  1. MaybeSendAddr() self-advertises to all peers we do address relay with, not just outbound ones.
  2. GetLocalAddrForPeer() called from MaybeSendAddr() makes a probabilistic decision to either advertise what they think we are or what we think we are, while PushAddress() on version deterministically only does the former if the address from the latter is unroutable.
  3. During version processing, we haven't received a potential sendaddrv2 message from our peer yet, so self-advertisements with addresses from addrV2-only networks would always be dropped in PushAddress().

Since it's confusing to have two slightly different mechanisms for self-advertising, and the one in MaybeSendAddr() is better, remove the one in version.

Previously, we would prepare to self-announce to a new peer while
parsing a VERSION message from that peer. This is redundant, because we
do something very similar in MaybeSendAddr(), which is called from
SendMessages() after the version handshake is finished.

There are a couple of differences:

1) MaybeSendAddr() self-advertises to all peers we do address relay with,
   not just outbound ones.
2) GetLocalAddrForPeer() called from MaybeSendAddr() makes a
   probabilistic decision to either advertise
   what they think we are or what we think we are, while
   PushAddress(self) on VERSION deterministically only does
   the former if the address from the latter is unroutable.
3) During VERSION processing, we haven't received a potential sendaddrv2 message
   from our peer yet, so self-advertisements with addresses from addrV2-only networks
   would always be dropped in PushAddress().

Since it's confusing to have two slightly different mechanisms for self-advertising,
and the one in MaybeSendAddr() is better, remove the one in VERSION.

Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
@mzumsande
Copy link
Contributor Author

FYI @naumenkogs @dergoegge

@sipa
Copy link
Member

sipa commented Sep 28, 2022

Concept ACK

@DrahtBot
Copy link
Contributor

DrahtBot commented Sep 28, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK amitiuttarwar, stratospher, naumenkogs
Concept ACK sipa, dergoegge, glozow

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #25619 (net: avoid overriding non-virtual ToString() in CService and use better naming by vasild)
  • #24008 (assumeutxo: net_processing changes by jamesob)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@DrahtBot DrahtBot added the P2P label Sep 28, 2022
@@ -3274,39 +3274,16 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
m_num_preferred_download_peers += state->fPreferredDownload;
}

// Self advertisement & GETADDR logic
if (!pfrom.IsInboundConn() && SetupAddressRelay(pfrom, *peer)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. SetupAddressRelay used to have a misleading comment: "First addr message we have received from the peer, initialize m_addr_known". We were not receiving any addr messages from the peer here, yet we initialized it. It seems like this was the case from the beginning.
  2. After this PR, calling SetupAddressRelay here seems unnecessary. If we drop it, the comment becomes valid :)
    Although the inconsistency is a good signal the comment is confusing, so perhaps we should drop it. Not very useful anyway, at least the current code is pretty readable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So yeah: 1) Drop SetupAddressRelay here; 2) Possibly drop the useless comment from SetupAddressRelay?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, this has side-effects. MaybeSendAddr (another self-ad) will stop working until we get an ADDR message from the peer (not the case before the change i suggest here).

Perhaps this code should be:

if (!pfrom.IsInboundConn()) {
    SetupAddressRelay(pfrom, *peer);
    ....
}

Copy link
Contributor Author

@mzumsande mzumsande Sep 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current code is correct, SetupAddressRelay here has three tasks:

  1. Initialize m_addr_known for outbound peers (unless block-relay-only), see [p2p] Reduce addr blackholes  #21528
  2. Prevent Self-Advertisements to block-relay-only outbound peers (for which SetupAddressRelay returns false)
  3. Prevent a GETADDR to block-relay-only outbound peers

Since 1) and 3) are still valid after this PR, I think we both need to keep calling SetupAddressRelay and skip if it returns false.
I think this should be documented better (I had to revisit 21528 to recall how it worked while preparing this PR), so I will add some more documentation for this soon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The refactoring I suggested in my last message here is indeed wrong, because it ignores the returned value.

Do you agree that the comment inside SetupAddressRelay is wrong though? That's the only thing that remains worrying me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you agree that the comment inside SetupAddressRelay is wrong though?

Yeah, it's incomplete because it only applies to inbound peers. I added documentation there and before the call to SetupAddressRelay during version processing (as a second commit, because it's not directly related to the changes).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "Prevent GETADDR from being sent" sounds very complicated, and also might have additional unnecessary meaning of pro-actively preventing something.

I'd say "Don't send GETADDR [...]" is better. Feel free to ignore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed this comment, changed wording now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed that the comment in SetupAddressRelay was misleading. thanks for catching / fixing. I didn't hunt down the history, but I bet that was my fault 😅

RE current comment about "Don't send GETADDR...":
I find it a bit confusing, because we're also not sending GETADDR message to inbound peers. maybe this code is too dense and is better communicated in a bit more verbose manner?

eg. in non-syntax-adhering code:

bool send_getaddr = false;
if (!pfrom.IsInboundConn()) { 
		send_getaddr = SetupAddressRelay();
    }
if (send_getaddr){
	// do GETADDR stuff
}

also cool if you want to skip the refactor and just want to update the comment to something like..

// Initialize address relay for all outbound peers
// Send GETADDR to non-block-relay-only outbound peers

I wish we had a better way to refer to non-block-relay-only outbound peers 😬

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it this would be better to understand if it's a bit more spread out, so I took your suggestion and changed the comment - hopefully it's clear now?

@dergoegge
Copy link
Member

Concept ACK

@naumenkogs
Copy link
Member

naumenkogs commented Sep 30, 2022

I dunno if I'm supposed to ACK this since this is my commit, but I approve the changes 3c43d9d I guess :)

An easy way to look at this change is to notice that this code is almost entirely a subset of MaybeSendAddr. At the same time, it heavily relies on MaybSendAddr to actually do something, not much would change if you drop the duplicate subset (what this PR does).

One real difference this could make is that due to the "makes a probabilistic decision to either advertise what they think we are or what we think we are" of this behavior, we could have ended up having two self-ads in one ADDR message (one added here, one added in MaybeSendAddr if they are different). I don't think this is a show-stopper for this PR. I don't think we want to preserve this behavior.

UPD:

Another important difference could be the time between the two calls. This might make sense if there's a risk of someone inserting our self-ad (PushAddress) with the wrong metadata, eg nTime. Then our latter self-ad insertion would be dismissed (because we just look up the key), while the VERSION-one is protected from this issue.
I don't think this is a show-stopper for this PR, although this surface could indeed be improved (e.g., handle duplicates better, or handle other-nodes-announce-us-through-us differently).

Copy link
Member

@glozow glozow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

@mzumsande
Copy link
Contributor Author

One real difference this could make is that due to the "makes a probabilistic decision to either advertise what they think we are or what we think we are" of this behavior, we could have ended up having two self-ads in one ADDR message (one added here, one added in MaybeSendAddr if they are different). I don't think this is a show-stopper for this PR. I don't think we want to preserve this behavior.

Yes, advertising both addresses is something that can happen, but only in a few cases:
e.g. if the address during version message processing is AddrV2-only, it is not sent out because we haven't gotten sendaddrv2 yet. I think it might happen if we discovered a local IPv6 address and our peer sees us under a IPv4 address?
I think that if this is something we really want, we could always advertise both (similar to your commit 10fc62f) but I'm not sure it is necessary - after all, we have the nScore logic to make an educated guess which address is better.

Another important difference could be the time between the two calls. This might make sense if there's a risk of someone inserting our self-ad (PushAddress) with the wrong metadata, eg nTime. Then our latter self-ad insertion would be dismissed (because we just look up the key), while the VERSION-one is protected from this issue.
I don't think this is a show-stopper for this PR, although this surface could indeed be improved (e.g., handle duplicates better, or handle other-nodes-announce-us-through-us differently).

I think I don't understand this completely, but the time between advertising during version message and MaybeSendAddr should usually be negligible, because m_next_local_addr_send is initialized to 0, so MaybeSendAddr always advertises right after the version handshake is completed.
Also, we clear m_addr_known right before self-advertising in MaybeSendAddr (code) so it wouldn't be possible for an attacker to do something like preventing us from self-advertising by sending us our own address. Is that what you meant?

@naumenkogs
Copy link
Member

ACK 24608c2

@amitiuttarwar
Copy link
Contributor

code review ACK 1002cd1. this PR improves the encapsulation of self-advertisement logic, which simplifies the code as well as our mental models of addr relay on the network.

@naumenkogs

Another important difference could be the time between the two calls.

@mzumsande

I think I don't understand this completely, but the time between advertising during version message and MaybeSendAddr should usually be negligible, because m_next_local_addr_send is initialized to 0, so MaybeSendAddr always advertises right after the version handshake is completed.

Agreed, but I'd take it slightly further. I think there is no difference in when the advertisement would actually occur. the ProcessMessage -> version logic called PushAddress which (at best) adds the address to m_addrs_to_send. The actual constructing & sending of the message wouldn't occur until the SendMessages loop calls MaybeSendAddr.


My understanding of this PR is that there would be no tangible behavioral differences in the cases where the version logic returned the same address as MaybeSendAddr logic. on master, if they result in two different addr, I believe we would advertise both. after this PR, we would only advertise the result of GetLocalAddrForPeer, which will select one of the two.

Since the version logic always prioritized GetLocalAddress (aka what we think we are), the behavioral difference would occur in instances when GetLocalAddrForPeer (aka weighted guess) selects the result from IsPeerAddrLocalGood (aka what peer thinks we are).

I think the only time this would cause a connectivity issue is if your peers are somehow reporting your address to be something that can't be connected to, and you would have advertised something usable with GetLocalAddress. I expect this to be quite rare, but additionally your self-identified address would eventually get out when GetLocalAddrForPeer selects the result of GetLocalAddress.

I think that if this is something we really want, we could always advertise both (similar to your commit 10fc62f) but I'm not sure it is necessary - after all, we have the nScore logic to make an educated guess which address is better.

For the reasons I stated above, I agree. Advertising both seems unnecessary, and the worst case should just be slower time for inbounds to connect to you?

@mzumsande @naumenkogs curious to hear your feedback. do these claims track with your understanding?

@@ -3274,39 +3274,16 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
m_num_preferred_download_peers += state->fPreferredDownload;
}

// Self advertisement & GETADDR logic
if (!pfrom.IsInboundConn() && SetupAddressRelay(pfrom, *peer)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed that the comment in SetupAddressRelay was misleading. thanks for catching / fixing. I didn't hunt down the history, but I bet that was my fault 😅

RE current comment about "Don't send GETADDR...":
I find it a bit confusing, because we're also not sending GETADDR message to inbound peers. maybe this code is too dense and is better communicated in a bit more verbose manner?

eg. in non-syntax-adhering code:

bool send_getaddr = false;
if (!pfrom.IsInboundConn()) { 
		send_getaddr = SetupAddressRelay();
    }
if (send_getaddr){
	// do GETADDR stuff
}

also cool if you want to skip the refactor and just want to update the comment to something like..

// Initialize address relay for all outbound peers
// Send GETADDR to non-block-relay-only outbound peers

I wish we had a better way to refer to non-block-relay-only outbound peers 😬

@amitiuttarwar
Copy link
Contributor

oops, didn't post last night - just some suggestions around language in the comments. not essential to incorporate.

@naumenkogs
Copy link
Member

Agreed, but I'd take it slightly further. I think there is no difference in when the advertisement would actually occur.

This was the initial purpose of the PR, yeah. In my reading of your comment, I'm not sure it "takes further" in that sense. My finding was more niche (again, probably very hypothetical) — if only the latter call is in place, the data could be hijacked. Trying to make sure we're on the same page :)

I agree with the rest of your comment.

This code was a bit hard to understand, so make it less dense and
add more explanations. Doesn't change behavior.

Co-authored-by: Amiti Uttarwar <amiti@uttarwar.org>
@mzumsande mzumsande force-pushed the 202209_advertise_once branch from 1002cd1 to 956c670 Compare November 23, 2022 21:20
@mzumsande
Copy link
Contributor Author

1002cd1 to 956c670: Applied suggestions by @amitiuttarwar to second commit - thanks!

do these claims track with your understanding?

yes, I agree!

the data could be hijacked.

I didn't understand this the first time, but I think I do now - you mean an attacker could continuously spam us with our own address from other connections (but with bad metadata), and thus prevent us from sending our self-advertisement (with correct metadata) to new peers because they'd inserted a duplicate in between our processing of the version messages and us getting to MaybeSendAddress().

I will have to think about this a bit more, but even if the timing issues and rate-limiting issues were solved, it would probably be really hard / impossible to that because of the way RelayAddress() relays a given address (no matter which peer it is received from) always to the same peers within 24 hours (which might change slightly with peers fluctuating, but not very much). It should be impossible for an attacker to make us relay a specific node in a targeted way to new nodes.

Copy link
Contributor

@stratospher stratospher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK. I went through the self advertisement logic and couldn't think of any concerning behavioral change this PR might induce. I do have a couple of questions though.

  1. There were some entries in my node's debug.log where the peer would send incorrect entries(here, the peer's own address/they could be tampered with) for what address the peer thinks we are (vRecv >> addrMe).
    • with the version processing mechanism, we'd self advertise our correct address at least once. (GetLocalAddress)
    • with the MaybeSendAddr mechanism, we'd sometimes self advertise addrMe (what peer thinks we are/could be tampered with) only. (GetLocalAddrForPeer)
    • The connections would last for only a short duration though(around 4 mins). Could this scenario be problematic?
  2. even if we don't accept inbound connections, we still self-advertise. is it because we can't detect them?

@mzumsande
Copy link
Contributor Author

  1. There were some entries in my node's debug.log where the peer would send incorrect entries(here, the peer's own address/they could be tampered with) for what address the peer thinks we are (vRecv >> addrMe).

    • with the version processing mechanism, we'd self advertise our correct address at least once. (GetLocalAddress)
    • with the MaybeSendAddr mechanism, we'd sometimes self advertise addrMe (what peer thinks we are/could be tampered with) only. (GetLocalAddrForPeer)
    • The connections would last for only a short duration though(around 4 mins). Could this scenario be problematic?

I think that it's not possible to say in general which address (the local or peer's one) is better, or "correct". In some cases, the peer may have a better idea how we are reachable than we do. In other cases, it's the other way round, or maybe neither are working if we're just not reachable by others.

Also note that if we overrule our own local address and advertise with the address a peer sees us with, we'll only do that back to that same peer. We never send an addrMe address from one peer to other peers.
Therefore, I think that peers can only influence the weight (nScore) between multiple local addresses (all of which we detected by ourselves at startup!) via SeenLocal() but never insert new ones that they pick - this should make it impossible to trick us into advertising an incorrect address to other peers. If a peer is malicious, they can just discard the self-announcements we send them anyway, so it's not an attack if such a peer makes us advertise to them with a wrong address.

  1. even if we don't accept inbound connections, we still self-advertise. is it because we can't detect them?

If we explicitly disable inbound connections (fListen == false), we don't self-advertise, see here.
But yes, it is a very common situation (e.g. if we're behind a NAT, or the port for bitcoin is blocked by our firewall) that bitcoind isn't able to detect that we're actually unreachable - so self-advertisements happen in this case.

Copy link
Contributor

@amitiuttarwar amitiuttarwar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@naumenkogs @mzumsande
thanks for the feedback. I think I now better understand the hypothetical issue in terms of when we queue up the addresses internally, not when we actually trigger the message. due to the various mechanisms Martin mentioned, I think most viable attacks would be mitigated.

reACK 956c670

@stratospher
Copy link
Contributor

ACK 956c670

@fanquake fanquake requested a review from naumenkogs December 1, 2022 16:53
@naumenkogs
Copy link
Member

ACK 956c670

@maflcko maflcko merged commit 6061eb6 into bitcoin:master Dec 12, 2022
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Dec 12, 2022
956c670 refactor, doc: Improve SetupAddressRelay call in version processing (Martin Zumsande)
3c43d9d p2p: Don't self-advertise during VERSION processing (Gleb Naumenko)

Pull request description:

  This picks up the last commit from bitcoin#19843.

  Previously, we would prepare to self-announce to a new peer while parsing a `version` message from that peer.
  This is redundant, because we do something very similar in `MaybeSendAddr()`, which is called from `SendMessages()` after
  the version handshake is finished.

  There are a couple of differences:

  1) `MaybeSendAddr()` self-advertises to all peers we do address relay with, not just outbound ones.
  2) `GetLocalAddrForPeer()` called from `MaybeSendAddr()` makes a probabilistic decision to either advertise what they think we are or what we think we are, while `PushAddress()` on `version` deterministically only does the former if the address from the latter is unroutable.
  3) During `version` processing, we haven't received a potential sendaddrv2 message from our peer yet, so self-advertisements with addresses from addrV2-only networks would always be dropped in `PushAddress()`.

  Since it's confusing to have two slightly different mechanisms for self-advertising, and the one in `MaybeSendAddr()` is better, remove the one in `version`.

ACKs for top commit:
  stratospher:
    ACK  956c670
  naumenkogs:
    ACK 956c670
  amitiuttarwar:
    reACK 956c670

Tree-SHA512: 933d40615289f055c022170dde7bad0ac0a1d4be377538bfe9ba64375cfeb03bcd803901591f0739ac4850c880e8475a68fd1ab0330800030ab7f19e38c00274
@mzumsande mzumsande deleted the 202209_advertise_once branch December 12, 2022 16:30
@bitcoin bitcoin locked and limited conversation to collaborators Dec 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants