-
Notifications
You must be signed in to change notification settings - Fork 5.7k
BIP155: include changes from followup discussions #907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Was it decided how to signal support for the new
|
cc @dongcarl (not sure if you have been working on this as well in the past) |
Right, so the reason why there hasn't been a PR updating the BIP is because we were still working out all the details, and was using the #766 thread as the place for discussion. Some offline discussion has taken place as well, so I should update that thread. |
4e7714b
to
181a224
Compare
So, this PR updates some bits that have settled (up to my understanding). How to signal for BIP155 is not one of them and there is still debate on how to best achieve it. It's not in this PR. I think it will deserve a separate PR once it settles. |
I'm not sure it's a good idea to coordinate all of the changes/ideas via small commits. The issue being that since BIPs are big conceptual ideas, it's usually better when there's someone (or several people) championing changes. Otherwise, it's harder too coordinate and get enough review for the change. I'm not even sure people watch these BIP repo PRs much... It's obviously fine to submit minor improvements to already existing BIPs, but if they are Final, not when there's a lot of uncertainty :) Said that, I think everyone will be happy if @vasild chooses to move progress on BIP155 forward, I'd just rather see progress in bigger chunks, along with a substantial discussion (mailing list, weekly meeting, etc). |
@dongcarl or @naumenkogs Would you be willing to be a co-champion? @laanwj Would you be willing to assign a co-champion? |
@MarcoFalke I'll be able to timely answer all the questions to the best of my knowledge, talk about different options and arguments, proofread. |
ACK 181a224 |
Appended a commit to change the signaling from "protocol version" to a "dedicated message", render preview. |
bip-0155.mediawiki
Outdated
|
||
Introduce a new message type <code>sendaddrv2</code>. Sending such a message indicates that a node can understand and prefers to receive <code>addrv2</code> messages instead of <code>addr</code> messages. I.e. "Send me addrv2". | ||
|
||
<code>sendaddrv2</code> SHOULD be sent immediately after the <code>version</code> message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even before receiving versionack
from the peer? (or doesn't this matter?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it does not matter, but the current code in bitcoin/bitcoin#19031 does it after receiving the verack
message, similarly to sendheaders
and sendcmpct
.
I updated this PR to clarify that.
934c2de
to
9286b52
Compare
Clarify when to send the |
Needs re-ACK I guess |
ACK 9286b52 |
I agree that this is potentially a worthwhile change because there isn't otherwise any negotiation about what kind of things are permitted inside address messages. Without it a whole v3 address message type would be needed just to support some larger addresses down the line. But makes the maximum size of an addr message 512kb, that is a pretty big slug of latency. Should this increase be couple with a maximum size of the total addr message?
I think this is a bad change and it isn't justified by the linked comment. Gleb's rational is related to networks that are not unknown to you but which you aren't connected to-- such as HS peers for a non HS node. I agree that nodes probably should store and forward popular network types that they aren't connected to, subject to the limitation that since they can't test these peers, they can't benefit from the nodes own livelyness detection and might just be garbage-- so it would be entirely understandable if they were deprioritized. I don't see any rational given in the link for storing or relaying entirely unknown address types. Particularly since you'll have no way to filter or sanity check these values. I think the combination of these two is particularly bad, effectively constructing a moderately high bandwidth free relay mechanism without a current productive use. Instead, if a new type is put into use-- what is the harm in older nodes just ignoring it? Eventually they'll upgrade and carry it too.
In combination with the prior point, this creates a transitive DOS attack in the presence of future or incomplete implementations. E.g. IETF smokes a great big bong and creates an IPv8 protocol that has 136 bit addresses. Bitcoin software is updated with a IPv8 type with a limit of 17 byte addresses. Now I start announcing v8-type addresses with 18 bytes in them, and non-upgraded peers now suddenly start getting all their addr messages ignored by newer peers because they're relaying address types that they don't understand which violate rules they don't know about. Can be resolved by strictly filtering things you know about and not relaying anything else. |
FWIW the implementation at bitcoin/bitcoin#19031 ignores addresses from unknown networks. The text in this PR reads:
What about changing it to:
|
I would go stronger and say SHOULD NOT because if a participant relays an address they don't understand it may violate the validity rules and get their messages ignored. |
I agree with the following suggestion from Greg:
I would drop the following part from your suggestion, because it's too specific while talking about the behavior that doesn't exist:
I would make it more general, although I don't have a strong opinion here. |
Good? |
sounds fine to me |
* Increase the maximum length of an address from 32 to 512 bytes and clarify that the entire message should be rejected if it contains a longer address. (from bitcoin#766 (comment)) * Remove a contradiction about unknown address types - "MUST ignore" VS "MAY store and gossip". * Recommend gossiping addresses for networks to which the node is not currently connected to. (from bitcoin#766 (comment)) * Clarify that the entire message should be rejected if it contains an address with unexpected size (e.g. IPv4 address with length 5).
@naumenkogs, @gmaxwell dare to ACK this one? Last activity here is that I reworded it as per your suggestion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 350189a
Can ignore the comments unless you need to retouch.
==Compatibility== | ||
==Signaling support and compatibility== | ||
|
||
Introduce a new message type <code>sendaddrv2</code>. Sending such a message indicates that a node can understand and prefers to receive <code>addrv2</code> messages instead of <code>addr</code> messages. I.e. "Send me addrv2". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/. I.e./, e.g./
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure the plain language phrase is needed, the explanation is clear enough without it.
If it's kept, then I think "i.e." is correct. It's a rewording of the previous sentence, not giving an example of a larger class of things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's kept, then I think "i.e." is correct. It's a rewording of the previous sentence, not giving an example of a larger class of things.
I agree.
Still prefer s/. I.e./, i.e./
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, agree i.e. is preferred over I.e.
I guess ideally (in the case of it being kept) it would be something like:
... messages, i.e. "Send me addrv2".
... messages - i.e. "Send me addrv2".
static const int GOSSIP_ADDRV2_VERSION = 70016; | ||
</source> | ||
For older peers keep sending the legacy <code>addr</code> message, ignoring addresses with the newly introduced address types. | ||
For older peers, that did not emit <code>sendaddrv2</code>, keep sending the legacy <code>addr</code> message, ignoring addresses with the newly introduced address types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/peers, /peers /
style: s/keep/continue/
ACK 350189a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 350189a
Reviewed the discussion around gossiping unknown networks most closely.
All of the following are RFC 2119 nits and can probably be ignored.
The blockquote regarding RFC 2119 at the beginning is wrong (though it's such a popular error that it probably doesn't need to be fixed). From RFC 2119's Errata, "NOT RECOMMENDED" is missing, it should say
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
==Compatibility== | ||
==Signaling support and compatibility== | ||
|
||
Introduce a new message type <code>sendaddrv2</code>. Sending such a message indicates that a node can understand and prefers to receive <code>addrv2</code> messages instead of <code>addr</code> messages. I.e. "Send me addrv2". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure the plain language phrase is needed, the explanation is clear enough without it.
If it's kept, then I think "i.e." is correct. It's a rewording of the previous sentence, not giving an example of a larger class of things.
|
||
Introduce a new message type <code>sendaddrv2</code>. Sending such a message indicates that a node can understand and prefers to receive <code>addrv2</code> messages instead of <code>addr</code> messages. I.e. "Send me addrv2". | ||
|
||
<code>sendaddrv2</code> SHOULD be sent after receiving the <code>verack</code> message from the peer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is completely clear.
If it is supposed to mean "... as opposed to before receiving the verack message from the peer" (like almost every other message) then:
s/SHOULD/MUST/
If it is supposed to mean "... and before anything else" (i.e. soon after receiving the verack
) then maybe it should be clarified to say that, but I'm not sure what precisely can be said. Looking at the implementation it seems like we could technically send sendaddrv2
whenever after verack
, and honestly I don't see a huge problem with that.
Clients are RECOMMENDED to gossip addresses from all known networks even if they are currently not connected to some of them. That could help multi-homed nodes and make it more difficult for an observer to tell which networks a node is connected to. | ||
|
||
Clients SHOULD reject addresses that have a different length than specified in this table for a specific address ID, as these are meaningless. | ||
Clients SHOULD NOT gossip addresses from unknown networks because they have no means to validate those addresses and so can be tricked to gossip invalid addresses. | ||
|
||
Further network ID numbers MUST be reserved in a new BIP document. | ||
|
||
Clients SHOULD reject messages that contain addresses that have a different length than specified in this table for a specific network ID, as these are meaningless. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparing L123 and L129 here.
RECOMMENDED and SHOULD are synonyms in RFC 2119. If we are intending these to mean two different things, maybe the first (RECOMMENDED) should be changed to MAY?
There is some demand for peers to be able to communicate not being interested in address broadcasts at all. @sdaftuar @sipa @naumenkogs E.g. light clients might not want unsolicited addr messaages and would not partake in addr relays. The only time they'd want to receive address messages is in direct response to This is not possible at this point (the current code assumes every peer takes part in this). But we could change this as part of addrv2. A possibility would be to make this a parameter of the |
Are there distinct cases of "I don't care what addr you send me, but I'm not going to relay any" and "don't send me unsolicited stuff"? Bitcoin Core should care about the former-- as it doesn't want to select non-relayers for low fanout relay. The latter sounds fine too but I dunno if anyone even wants it. |
@gmaxwell If you're suggesting to have 3 levels:
I'm not sure that's desirable. An implementation would want to pick 1-2 fanout peers among the level (3) ones, but relay to some extent to the level (2) ones in addition to that. If that's the case, an attacker would be able to get a higher than proportional number of addr messages revealed to them by making many connections to a peer at level (2). |
No, actually the other way around. I'm suggesting that "I will relay stuff" and "I won't relay stuff" is the only interesting criteria. I don't think anyone cares if peers send them addr messages they don't need. |
I previously thought that this will help with better control of unsolicited addr forwarding, but we'll be fine either way considering this observation by @sipa: I guess they might have the bandwidth constraints and that's the only real reason. And even that is probably no big deal. |
@gmaxwell I think my point is that whether we treat it as "i will/won't relay" or as "i want/don't want unsollicited addresses" doesn't matter, because how we treat them should be identical: if someone doesn't want addresses at all, we can use that to bias relay away from them, but we can't bias way from them but still relay to them. I think it's easiest to call it a "give me unsolicited addresses or not" flag and bitcoin core will then relay uniformly over all that want addresses. |
Seems like it's just a difference in how we describe the boolean, and not what we expect the intended behavior to be -- but I think it would make sense to add a boolean that we could use to determine whether the peer is participating in addr relay or not (which in turn would determine whether we send them unrequested addr messages or not). The only distinction I can think of between those two ideas is if a node might want to announce its own address to all its peers, even the ones that don't participate in addr relay? If we would, then I think characterizing the boolean as "participates in addr relay" would be the right descriptor. |
Then setting it to false might be (mistakenly I assume?) interpreted as "a node doesn't respond to GETADDR request". |
What do people think about separating message types so that the response to a Right now we have no ability to differentiate the two; in Bitcoin Core we make a guess based on whether we've sent a I'm not sure if this is too large a change to make this late in the process -- curious what others think. |
The discussion in #19794 demonstrated that it can be useful to clarify the I see 2 alternatives:
Right now we sort of rely on the combination of these two it seems. #19794 demonstrated that these alternatives are hard to reason about. I think an explicit flag in the |
If we differentiate the two, then what shall we do if we get an unsolicited "response to getaddr"? Why is it that we don't want to gossip responses to getaddr? Why not always gossip up to 10 randomly picked addresses, regardless of whether we sent getaddr or not? The current code seems a bit strange - if we receive an |
I think we can just specify in the BIP how errors are handled; my inclination is that disconnecting peers who violate the spec is probably the best way to ensure that software is coded up correctly when implemented, but this could be discussed if others feel differently.
Though relevant, I don't think this is the central question -- I think the central question is, could there ever be situations where differentiating between the response to a I would add that one of the lessons I learned after implementing BIP 130 (sendheaders) is that reusing I think that same basic lesson applies here: it's basically free to disambiguate between the two, implementations that don't care can just ignore the disambiguation and treat both types of messages the same, while implementations that want to do something different would have the ability to do so. In my view the main question would be whether it's worth wrapping such a change into this proposal (which has been open for some time and making progress towards deployment), or deferring this proposal to the future? |
@sdaftuar I think there may be another angle to this, which makes addr handling very different from block header handling: headers are verifiable data structures, which permit detecting/disconnecting/punishing peers that send invalid ones. Addresses are just unauthenticated data, so if semantics differ between "addresses just gossipped" and "addresses in response to getaddr", we should be mindful that an attacker could at any time choose between them without any real consequences. It does seem simpler for protocol handling to distinguish them, but wouldn't this work as well?
I'm also happy to be convinced that there aren't any useful attacks that could be used if the two types of responses are distinguished. But given that addr's timestamps are already kind of a proxy for "currently relyable", maybe there is no need? |
I think this works in honest cases, but when considering an adversary, it feels like the mechanism you described is more prone just due to complexity. Remember that we sometimes do unsolicited relay on behalf of other nodes (if addr < 10) and such. "Malicious load handler" and rate limiters may be tricked by this too. "Explicit ADDR flag" is, on the other hand, as robust as current ADDR handling, at least the implementation I envision in context of #19794 it can be. This is just my intuition for now, I don't have an exact way of breaking your mechanism, especially since it's only an idea. |
Do we want to require nodes to perform some minimum sanity checks on address types they gossip? Currently the BIP only requires that the length field is enforced. The currently proposed Bitcoin Core implementation bitcoin/bitcoin#19845 performs an additional check, namely the TORv3 checksum, but just ignores the item if there's a problem. The BIP lists additional properties that a node could check, such as the public key being a valid ed25519 pubkey. The scenario I'm worried about is an attacker gossiping a whole bunch of nonsense that's expensive for nodes to verify, but with no way to punish them because we can't tighten this rule later. I'm also a bit worried about nodes receiving - even if they don't relay - huge amounts of nonsense in the form of fake "future addresses". But afaik it doesn't matter what the maximum size is per address, so it's a bit orthogonal to the changes in this PR. |
LGTM. I think it would be good to have these changes (which are now being discussed as part of BIP155 anyway) reflected in the BIP. There are a few follow-up discussions still happening here, but I think it would be better if the BIP is updated to at least the things in this PR. |
@Sjors There are no checksums in BIP155 TORv3 handling, only in the conversion from/to human readable ".onion" notation. I don't think any further checks should be performed. Sending bogus invalid addresses is not more harmful to the network than sending bogus valid addresses, so no need to waste time on it. Invalid ones will be caught by Tor, which is in a much better position to judge validity. |
I don't think this is exactly true, at at least it shouldn't be -- "addresses in response to getaddr" would only be allowed if a However if we put some reasonable bounds on the response I think this concern should go away (one simple rule would be that addr-response messages must come sequentially, so that as soon as a network message arrives that isn't an addr-response, the response is over -- we could additionally bound the time from sending a With something like that, we could view addr traffic as two separate protocols:
So these protocols can only overlap after an addr_request is sent -- outside of such a window, these two protocols are non-overlapping. If we had either one and not the other, it's hard to imagine that the system would be definitely attackable by an adversary, and it's hard for me to imagine that if we had one already, that adding the other would definitely break anything, since mostly these don't overlap. That is not to say that any given implementation couldn't be broken -- but to argue against having a way to distinguish, I think you'd have to argue that all possible (useful) implementations where the two kinds of data are distinguished are definitely attackable? I suspect that we have insufficient study/research at this point to be able to make that claim, so given that we have at times in the past (and at present in the Bitcoin Core code) tried to distinguish these cases, I'd say we should go ahead and do so at the protocol level if we have a good opportunity for doing so. My basic assumption here is that honest nodes participating in this protocol would make some things more efficient, while it shouldn't necessarily be more vulnerable to attack by an adversary than today's protocol. |
@sdaftuar Right - if you think of it as two different protocols that seems clearly fine. I have a feeling it's possible to merge them, but that's probably harder than simply adding a new message here while the rest of address relay is being touched anyway. |
ACK 350189a |
Increase the maximum length of an address from 32 to 512 bytes and
clarify that the entire message should be rejected if it contains
a longer address.
(from BIP 155: addrv2 BIP proposal #766 (comment))
Remove a contradiction about unknown address types - "MUST ignore" VS
"MAY store and gossip".
Recommend gossiping addresses for unknown network ID and for networks
to which the node is not currently connected to.
(from BIP 155: addrv2 BIP proposal #766 (comment))
Clarify that the entire message should be rejected if it contains an
address with unexpected size (e.g. IPv4 address with length 5).