Skip to content

Conversation

sdaftuar
Copy link
Member

@sdaftuar sdaftuar commented Aug 5, 2020

Onion peers are disadvantaged under our eviction criteria, so prevent eventual
eviction of them in the presence of contention for inbound slots by reserving
some slots for localhost peers (sorted by longest uptime).

Block-relay-only connections exist as a protection against eclipse attacks, by
creating a path for block propagation that may be unknown to adversaries.
Protect against inbound peer connection slot attacks from disconnecting such
peers by attempting to protect up to 8 peers that are not relaying transactions
but have provided us with blocks.

Thanks to gmaxwell for suggesting these strategies.

@sdaftuar
Copy link
Member Author

sdaftuar commented Aug 5, 2020

This should fix #19500, but I need to figure out how to test this...

@sdaftuar sdaftuar force-pushed the 2020-08-improved-eviction branch from 75533f2 to ce35ffe Compare August 5, 2020 19:07
@DrahtBot DrahtBot added the P2P label Aug 5, 2020
@sdaftuar sdaftuar force-pushed the 2020-08-improved-eviction branch from ce35ffe to e5af0d1 Compare August 6, 2020 01:01
@fanquake fanquake added this to the 0.21.0 milestone Aug 6, 2020
@practicalswift
Copy link
Contributor

Concept ACK (modulo proper testing of course :))

@naumenkogs
Copy link
Member

Concept ACK.

Copy link

@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK on the onion peers protection. We might do the block-relay-only peer protection a bit better.

@@ -903,12 +918,31 @@ bool CConnman::AttemptToEvictConnection()
// Protect 4 nodes that most recently sent us transactions.
// An attacker cannot manipulate this metric without performing useful work.
EraseLastKElements(vEvictionCandidates, CompareNodeTXTime, 4);
// Protect up to 8 non-tx-relay peers that have sent us blocks.
std::sort(vEvictionCandidates.begin(), vEvictionCandidates.end(), CompareNodeBlockRelayOnlyTime);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nLastBlockTime is updated at ProcessNewBlock return if the process block is new to us and connecting on our best chain, among other yardsticks. An attacker-controlled inbound connection by always being the fastest to announce block to us can trump other honest block-relay-only inbound to prevent update of their nLastBlockTime. I think assuming a malicious peer can be always the fastest is reasonable, just connect to Fibre and bypass some validation checks. By rotating fastest block announcement among few of its inbound connections, a malicious attacker seems to win over this protection.

Have you consider instead selecting randomly among all our block-relay-only peers 8 of them ? I think that would make this protection better as an attacker can't predict its block-relay-only inbound will be favored. Because if an attacker can forge the fast-block-announcement characteristic as protected by the following protection ("Protect 4 nodes that most recently sent us blocks"), it's not hard to also forge this new one as signaling as block-relay-only is cheap.

Correct me I I'm wrong about ProcessNewBlock and nLastBlockTime updates, this part is fairly complex with all the anti-DoS/trickery measures.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right about the code and behavior -- an adversary that is trying to be well-behaved by being the first to deliver blocks to us, from 8 different block-relay peers, can indeed take over this protection -- but randomly choosing among block-relay-only peers is not helpful protection. If we choose randomly each time, an attacker can game that by merely spamming us with inbound connections until any given peer that had protection in a prior round has lots its protection in this round, and might get evicted.

Note also that it is free to pretend to be a blocks-only full-node peer; you merely send the right version message to us on connection. It is more work, and useful work, to be the first peer to deliver us a new block. So randomly protecting inbound peers who turn off tx-relay is not a useful thing to do, even if your random selection were on a longer timer to try to avoid the spamming problem I described above.

One way to think about the protections here is: if we can pick out a set of peers that may be useful to us, protect them. Because the alternative to protection is that we may lose peers that are demonstrably useful just because an attacker is spamming us. If we make a mistake and include peers that happen to also be adversaries because they're acting like good peers, that's no worse than if we didn't add the protection at all; but if we can meaningfully increase the cost to an adversary of bypassing the protection, then we've accomplished something.

So maybe a poor criteria would be one where an attacker does useful work for us once and then can misbehave but still take over most or all of our protected spots. I don't think that's what is happening here (arguably, earliest connection time has the risk of an attacker being able to take it over, eventually -- but the first sort is on most-recently-gave-us-a-block, so an attacker would have to be able to keep doing this to ensure that other peers don't gain an advantage).

Copy link
Member Author

@sdaftuar sdaftuar Aug 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe a poor criteria would be one where an attacker does useful work for us once and then can misbehave but still take over most or all of our protected spots. I don't think that's what is happening here (arguably, earliest connection time has the risk of an attacker being able to take it over, eventually -- but the first sort is on most-recently-gave-us-a-block, so an attacker would have to be able to keep doing this to ensure that other peers don't gain an advantage).

I do worry that this might be the case, however, because block-relay-only peers may not be the first to deliver us blocks very often (it's possible that due to compact block relay, there are biases in favor of compact block reconstruction happening best on links that are also relaying transactions -- I am not sure). In which case, the attack you describe of an adversary connecting 8 times, delivering the next 8 blocks from each one, and then being silent, may well work for a while to cause only their links to be protected, because the honest peers have a tough time beating other peers.

We can try to get some anecdotal data on this by looking at our own existing nodes to see if apparent block-relay peers are being selected as HB compact block peers? @gmaxwell any thoughts here?

EDIT: I guess my overall thought is that even if this is somewhat gameable, adding this won't hurt and may help. If anyone comes up with something better we can always improve it.

Copy link

@ariard ariard Aug 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, after writing this, I realized thinking further that my counter-proposal wasn't worthy it. As you hinted, analyzing eviction as a serie of interdependent events means sooner or latter, an attacker still has a high-chance of success.

If we make a mistake and include peers that happen to also be adversaries because they're acting like good peers, that's no worse than if we didn't add the protection at all; but if we can meaningfully increase the cost to an adversary of bypassing the protection, then we've accomplished something.

Thanks for the explanation. I agree that the current protection of fast-delivery-block peers is worth it and compel an attacker to position its nodes well with regard to block-topology and propagation, even if potentially gameable. Where I was more doubtful was on the value of adding a second protection relying on an-already used assumption. Even I was presuming that an attacker, by forgetting validation because it's a zero-cost accepting invalid blocks, will always trump block announcement, independently of being in concurrence with compact block peers.

I think there is the interesting property with block-relay-only peers that it's far far harder to guess their pairing, as you remember back in commit. Whereas our protected-by-netgroup peers (1st heuristic) might be full-relay ones, and thus an attacker might be able to guess their IP prefix by tx-relay topology probing.

Block-relay-only peers don't have this weakness so it could be more worthy to combine them with the netgroup critera rather than with the fast-block-delivery one ?

Overall, if we can't come up with better ideas, you right let's move further and try to address this kind of issues in future works on network anomalies detection/anti-eclipse.

Copy link
Contributor

@gmaxwell gmaxwell Aug 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ariaid There are, unfortunately, a lot of "nodes" that just connect and don't do anything useful. Reserving slots for zombies that don't even once do something useful isn't very helpful. Unfortunately the software doesn't really have that much criteria for deciding usefulness and most things you could imagine could be easily faked. One advantage of favouring many peers that have recently sent blocks is that even if the leader is someone who's only making themselves first for a little while before misbehaving there are still many other slots. (Also, aside, the public fibre network is gone and the software is now unmaintained)

Keep in mind that blocks-only peers are currently disadvantaged because they're ineligible for the CompareNodeTXTime criteria-- yet they are important: because they're difficult for an attacker to map they are intended form an important part of the network's protection against an attacker carving partitions into the network via DOS attacks. It would be extremely unfortunate to have that protection get undermined by a simple connection flooding attack that gets most of the blocks-only peers evicted.

You can see the recent blocks protection for blocks-only peers being morally similar to reserving part of the longest uptime slots for localhost/onion connections. In this case actually 'reserving' doesn't make sense because the number protected for that reason is relatively few. Basically there are now two reasons for a group to exist: Spreading connections across orthogonal kinds of being good under the theory that its harder for an attacker to be good in many ways, and spreading connections across different kinds of (useful) peers.

One could even imagine a generalization where you have a collection of different kinds of peers {ipv4, ipv6, onion, local-network, blocks-only, accepts-inbound-according-to-the-feeler-test} and different kinds of goodness {latency, block relay, tx relay, netgroups, uptime} and then you form a cross-product of the two sets and keep the best in each bucket--- but currently there just aren't that many slots to fight over.

If it were changed to netgroup based, I'd still recommend nLastBlockTime!=0 and fRelevantServices as sorting criteria ahead of a random netgroup preference.

At one point I wanted to add a generic "helpful" flag that was based on some a set of criteria that judged if the peer was probably capable of serving us a novel block-- do they have the right services? if not-- not helpful. have they recently given us a block? helpful. have they recently given us a new transaction we accepted? helpful. Have they been recently announcing blocks we eventually accepted? helpful. Otherwise not helpful. ... stuff like that-- so essentially all working peers should be flagged as helpful.

And then have all the eviction criteria (except perhaps the longest uptime) use the helpful flag as the first sorting key, so that reserved slots won't be taken by stuff that is very obviously broken. Attackers can feign helpfulness-- sure, but some won't bother. And weirdly broken nodes (some of which are just incompetent attackers) are common and don't feign anything.

I just found it to be too difficult to meet people's expectations for testing for this kind of behaviour.

Copy link

@ariard ariard Aug 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmaxwell thanks for your explanation, I would push your model a bit further, beyond types of peers/goodness, you can classify goodness criterias between themselves, with regards to hardness assumptions on attackers. Some are good-behavioral monitoring (block-delivery, stable connections, ...), other costly-topological-identifier (asmap, net-address), other non-observable ones (netgroup sort, you could imagine peer nonce sort).

It needs further research but I fear good-behavioral ones are on the low scale of hardness, because once you have an attacker infrastructure deployed it sounds a marginal cost to fake good-behavioral ones (e.g learn the block first, give it to all your node puppets). These types of goodness are the ones we use the most, I don't deny their usefulness but I lean to think about them more like sanitization measures than inbound slot overtakes protections. By pruning out zombies nodes like you underscore, they optimize your resources, in the sense for a given bucket, increase potential utility.

With regards to switching to a pro-active logic, like the "helpful" described, it defavor low-grade full-nodes even when they are protocol compliant. I presume that's already an effect of our eviction heuristics but I would be cautious going further and such breaking some network-wise diversity of peers. Obviously pushing for more efficiency-based pairing isn't great for robustness.

Stricter slot allocation, especially based on netgroup might have the side-effect to facilitate some net-splits, like if you allocate X slots based on netgroup ABC, an attacker, once deployed in this netgroup can exhaust the quota towards every listening peers. I guess peers netaddresses aren't evenly distributed across netgroup so you might be able to drop a fair chunk of the network.

Note, we do addr-relay for blocks-only peers, not for block-relay-only. So it might be easier for an attacker to map them than expected. I don't think we want to severe addr-relay for blocks-only as otherwise it would be harder (impossible?) for them to learn about new addresses.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other non-observable ones (netgroup sort, you could imagine peer nonce sort).

Not sure if it was clear, but the motivation for netgroup diverse peers was also along the lines of trying to restore topology diversity.

Many criteria-- latency, block relay, and tx relay all somewhat favor peers that are geographically close. So you don't want a situation where most nodes are just incestuously linked within the same geography, resulting in a small min-cut to other continents. It's easy to imaging the network becoming that way over time when you factor in the fact that further links crossing more networks are a little less reliable than shorter ones.

I fear good-behavioral ones are on the low scale of hardness, because once you have an attacker infrastructure deployed it sounds a marginal cost to fake good-behavioral ones

Sure, though if they stop being good, they'll quickly lose the advantage it gave them. ... assuming they haven't already successfully eclipsed you entirely. If they keep being good then hurrah, you won.

, once deployed in this netgroup can exhaust the quota towards every listening peers

Yes, though the preference isn't directly observable.

presume that's already an effect of our eviction heuristics but I would be cautious going further and such breaking some network-wise diversity of peers

I really disagree. It's a reason to be not overly specific in what constitutes helpful, but there are a pretty obvious set of behaviours (like relaying us blocks) which if a peer doesn't do, it isn't helping us stay non-partitioned from the honest network. A peer is free to not to helpful stuff, they wouldn't be disconnected directly for it-- but for criteria that is expressly intended to protect the nodes' connectivity, it can make sense to filter it.

I don't even think it would be completely absurd to reserve some connection capacity for peers which claim to be exactly the same protocol & subversion as you. Sure, a malicious party could spoof it, always matching it-- but there have been several incidents in the past where we were concerned about fork software accidentally engaging in a partitioning attack: Say Mallory release a fork of the software that on some date will change its consensus rules to require every block sends half its coinbase to her, Mallory isn't malicious, just stupid. She suckers a fair number of people into running this fork. When the time comes, your node is, by chance, totally surrounded by BitcoinMallory nodes. No one mines any Mallory blocks (or, if they do-- the Mallory node graph may be partitioned and not get them to any of your neighbours) and the Mallory peers block the 'invalid' valid blocks from reaching you. Without any blocks showing up you can't tell the Mallory nodes consensus rules differ. The stale tip detection is a last ditch hail marry to recover from this state, but it may not work well when you need it (e.g. due to honest nodes being full or attacked, or just taking a long time to find one that's up and not a Mallory node). Wouldn't it have been better if a couple of your connections were running the same software as you-- so that if any of them were connected to the graph of healthy nodes you would be too?

It's critical to keep in mind that broken software radically outnumbers attackers at almost all times. Esp because if countermeasures against attackers are generally effective then there won't be attackers. Also, if merely being a "broken" peer is enough to harm the network

Note, we do addr-relay for blocks-only peers,

It's a choice of the peer if they want to announce themselves through us. You make a great point that a peer might be wise to suppress its announcements toward and from its outbound block-only connections because those connections exist in part to help obscure the node connectivity graph. I thought they already did this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering how easy it is to be the first to announce a block and then launch a rotation right after, I'm not very positive abound this solution.

I acknowledge that finding these criteria is hard though.
I would suggest "take all no-tx nodes announced a block within a day, and save the most long-living connections among them".

It would still protect from completely fake nodes (no blocks at all), and impose a stronger attack cost I'd say (in terms of connection lifetime). Another benefit is slower-but-persistent honest blocks-only peers won't be evicted.

Not a strong opinion, I'm not a big fan of long-lifetime bias either, it just feels a bit stronger for now.
For now gonna ACK the existing code.

@sdaftuar sdaftuar force-pushed the 2020-08-improved-eviction branch from e5af0d1 to ec4ee3f Compare August 6, 2020 15:47
@instagibbs
Copy link
Member

@wtogami

@@ -887,7 +902,7 @@ bool CConnman::AttemptToEvictConnection()
node->nLastBlockTime, node->nLastTXTime,
HasAllDesirableServiceFlags(node->nServices),
peer_relay_txes, peer_filter_not_null, node->addr, node->nKeyedNetGroup,
node->m_prefer_evict};
node->m_prefer_evict, node->addr.IsLocal()};
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discovered that doing IsLocal(node->addr) is different from node->addr.IsLocal(), and the latter appears to be correct, but if someone can confirm that this captures the right idea I'd appreciate it.

Copy link
Contributor

@dongcarl dongcarl Aug 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat confusingly:

  • node->addr.IsLocal() returns true only if node->addr is IPv[46] and is in the loopback address ranges
  • IsLocal(node->addr) checks whether or not node->addr resides in mapLocalHost, which tracks our view of addresses we might be referred to externally

So in this case, I believe your use of node->addr.IsLocal() is correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. addr.IsLocal() is right. IsLocal() ought to be renamed to IsProbablyMyAddress. :)

@sdaftuar
Copy link
Member Author

sdaftuar commented Aug 6, 2020

Updated to fix two bugs I discovered, and I did some light manual testing and verified that this seems to now do what is intended.

@ghost
Copy link

ghost commented Aug 6, 2020

@sdaftuar I've boldly copied all the changes to net.cpp and recompiled/rebuild bitcoind (0.20.0). I will report back if it solves my issue but that will take some time (a few weeks or longer).

@gmaxwell
Copy link
Contributor

ACK ec4ee3f

Copy link

@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review ACK ec4ee3f.

@@ -845,6 +852,14 @@ static bool CompareNodeTXTime(const NodeEvictionCandidate &a, const NodeEviction
return a.nTimeConnected > b.nTimeConnected;
}

// Pick out the potential block-relay only peers, and sort them by last block time.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have to modify, you might change to non-tx-relay peers/CompareNonTxRelayPeers as comment above the sort is referring to.

vEvictionCandidates.erase(std::remove_if(vEvictionCandidates.end() - local_erase_size, vEvictionCandidates.end(), [](NodeEvictionCandidate const &n) { return n.m_is_local; }), vEvictionCandidates.end());
// Calculate how many we removed, and update our total number of peers that
// we want to protect based on uptime accordingly.
total_protect_size -= initial_size - vEvictionCandidates.size();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can write it as size_t total_protect_size = initial_size / 2 - (initial_size - vEvictionCandidates.size()) and thus save one line above. I found this equation expressing better that up to 1/2 remaining peers are protected in function of effectively protected localhost.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the way I wrote it makes sense, so not inclined to change this (seems like mostly style and not actually any more or less confusing?).

@sdaftuar
Copy link
Member Author

Anything left here? From #19500 it seems this fixes the user's issue.

@naumenkogs
Copy link
Member

ACK ec4ee3f

@DrahtBot
Copy link
Contributor

DrahtBot commented Sep 1, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

No conflicts as of last run.

Onion peers are disadvantaged under our eviction criteria, so prevent eventual
eviction of them in the presence of contention for inbound slots by reserving
some slots for localhost peers (sorted by longest uptime).

Block-relay-only connections exist as a protection against eclipse attacks, by
creating a path for block propagation that may be unknown to adversaries.
Protect against inbound peer connection slot attacks from disconnecting such
peers by attempting to protect up to 8 peers that are not relaying transactions
but appear to be full-nodes, sorted by recency of last delivered block.

Thanks to gmaxwell for suggesting these strategies.
@sdaftuar sdaftuar force-pushed the 2020-08-improved-eviction branch from ec4ee3f to 752e6ad Compare September 2, 2020 13:22
@laanwj
Copy link
Member

laanwj commented Sep 3, 2020

I think this strategy makes sense.
Code review ACK 752e6ad

@laanwj laanwj merged commit 69a13eb into bitcoin:master Sep 3, 2020
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Sep 3, 2020
…m eviction

752e6ad Protect localhost and block-relay-only peers from eviction (Suhas Daftuar)

Pull request description:

  Onion peers are disadvantaged under our eviction criteria, so prevent eventual
  eviction of them in the presence of contention for inbound slots by reserving
  some slots for localhost peers (sorted by longest uptime).

  Block-relay-only connections exist as a protection against eclipse attacks, by
  creating a path for block propagation that may be unknown to adversaries.
  Protect against inbound peer connection slot attacks from disconnecting such
  peers by attempting to protect up to 8 peers that are not relaying transactions
  but have provided us with blocks.

  Thanks to gmaxwell for suggesting these strategies.

ACKs for top commit:
  laanwj:
    Code review ACK 752e6ad

Tree-SHA512: dbf089c77c1f747aa1dbbbc2e9c2799c628028b0918d0c336d8d0e5338acedd573b530eb3b689c7f603a17221e557268a9f5c3f585f204bfb12e5d2e76de39a3
@bitcoin bitcoin deleted a comment from taurus228 Jan 8, 2021
laanwj added a commit that referenced this pull request Mar 30, 2021
…eviction protection test coverage

0cca08a Add unit test coverage for our onion peer eviction protection (Jon Atack)
caa21f5 Protect onion+localhost peers in ProtectEvictionCandidatesByRatio() (Jon Atack)
8f1a53e Use EraseLastKElements() throughout SelectNodeToEvict() (Jon Atack)
8b1e156 Add m_inbound_onion to AttemptToEvictConnection() (Jon Atack)
72e30e8 Add unit tests for ProtectEvictionCandidatesByRatio() (Jon Atack)
ca63b53 Use std::unordered_set instead of std::vector in IsEvicted() (Jon Atack)
41f84d5 Move peer eviction tests to a separate test file (Jon Atack)
f126cbd Extract ProtectEvictionCandidatesByRatio from SelectNodeToEvict (Jon Atack)

Pull request description:

  Now that #19991 and #20210 have been merged, we can determine inbound onion peers using `CNode::m_inbound_onion` and add it to the localhost peers protection in `AttemptToEvictConnection`, which was added in #19670 to address issue #19500.

  Update 28 February 2021: I've updated this to follow gmaxwell's suggestion in #20197 (comment).

  This branch now protects up to 1/4 onion peers (connected via our tor control service), if any, sorted by longest uptime. If any (or all) onion slots remain after that operation, they are then allocated to protect localhost peers, or a minimum of 2 localhost peers in the case that no onion slots remain and 2 or more onion peers were protected, sorted as before by longest uptime.

  This patch also adds test coverage for the longest uptime, localhost, and onion peer eviction protection logic to build on the welcome initial unit testing of #20477.

  Suggest reviewing the commits that move code with `colorMoved = dimmed-zebra` and `colorMovedWs = allow-indentation-change`.

  Closes #11537.

ACKs for top commit:
  laanwj:
    Code review ACK 0cca08a
  vasild:
    ACK 0cca08a

Tree-SHA512: 2f5a63f942acaae7882920fc61f0185dcd51da85e5b736df9d1fc72343726dd17da740e02f30fa5dc5eb3b2d8345707aed96031bec143d48a2497a610aa19abd
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Jun 15, 2021
Summary:
```
Onion peers are disadvantaged under our eviction criteria, so prevent
eventual eviction of them in the presence of contention for inbound
slots by reserving some slots for localhost peers (sorted by longest
uptime).

Block-relay-only connections exist as a protection against eclipse
attacks, by creating a path for block propagation that may be unknown to
adversaries. Protect against inbound peer connection slot attacks from
disconnecting such peers by attempting to protect up to 8 peers that are
not relaying transactions but appear to be full-nodes, sorted by recency
of last delivered block.
```

Backport of [[bitcoin/bitcoin#19670 | core#19670]].

Note that tests will be added when porting this PR:
bitcoin/bitcoin#20477

Ref T1611.

Test Plan:
  ninja all check-all

Reviewers: #bitcoin_abc, PiRK

Reviewed By: #bitcoin_abc, PiRK

Maniphest Tasks: T1611

Differential Revision: https://reviews.bitcoinabc.org/D9676
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Aug 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow to specify the number of Tor connections and clearnet connections separately