-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Evict outbound peers if tip is stale #11534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When in IBD, we'd like to use all our outbound peers to help us sync the chain. Disconnect any outbound peers whose headers have insufficient work.
Currently we have no rotation of outbound peers. If an outbound peer stops serving us blocks, or is on a consensus-incompatible chain with less work than our tip (but otherwise valid headers), then we will never disconnect that peer, even though that peer is using one of our 8 outbound connection slots. Because we rely on our outbound peers to find an honest node in order to reach consensus, allowing an incompatible peer to occupy one of those slots is undesirable, particularly if it is possible for all such slots to be occupied by such peers. Protect against this by always checking to see if a peer's best known block has less work than our tip, and if so, set a 20 minute timeout -- if the peer is still not known to have caught up to a chain with as much work as ours after 20 minutes, then send a single getheaders message, wait 2 more minutes, and if a better header hasn't been received by then, disconnect that peer. Note: - we do not require that our peer sync to the same tip as ours, just an equal or greater work tip. (Doing otherwise would risk partitioning the network in the event of a chain split, and is also unnecessary.) - we pick 4 of our outbound peers and do not subject them to this logic, to be more conservative. We don't wish to permit temporary network issues (or an attacker) to excessively disrupt network topology.
f2e22d1
to
2ea6f2c
Compare
@@ -2862,3 +2878,15 @@ uint64_t CConnman::CalculateKeyedNetGroup(const CAddress& ad) const | |||
|
|||
return GetDeterministicRandomizer(RANDOMIZER_ID_NETGROUP).Write(vchNetGroup.data(), vchNetGroup.size()).Finalize(); | |||
} | |||
|
|||
void CConnman::AddToVNodes(CNode &node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this only meant for tests?
Otherwise replace the following code with a call to this function?
Lines 1140 to 1143 in ff92fbf
{ | |
LOCK(cs_vNodes); | |
vNodes.push_back(pnode); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is only for the unit test; I'm hoping someone is able to suggest a better way of doing this than what I did here to get the test working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with leaving the hacks for now, though please make them private and a friend to the boost test functions.
After we merge #11457 (and the follow-up for addrman: https://github.com/theuni/bitcoin/tree/move-addrdb), I believe we'll have enough broken out to create 2 CConnman instances and run them against each-other for these tests. At that point, we can remove the test hacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2ea6f2c
to
c071f62
Compare
int64_t worst_peer_connect_time = 0; | ||
for (auto it = mapNodeState.begin(); it != mapNodeState.end(); ++it) { | ||
int64_t connect_time = 0; | ||
if (connman->ForNode(it->first, [&](CNode *pnode){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we might turn this logic around: rather than marking for disconnection, mark for protection from disconnection. For example, each time we get a new header that extends our tip, we could do:
connman->ResetOutboundEvictionTimer(id, current_time + new_timeout);
Then when we need a connection slot, if any timer has expred, whichever has been expired the longest gets dropped.
That saves us from having to care about what type of node (inbound/outbound/oneshot/manual/etc.) this is, and just lets outbound eviction do its thing if the criteria are met.
src/net_processing.cpp
Outdated
@@ -127,6 +127,12 @@ namespace { | |||
/** Number of outbound peers with m_protect_from_disconnect. */ | |||
int g_outbound_peers_with_protect_from_disconnect = 0; | |||
|
|||
/** When to next check whether our tip is stale. */ | |||
int64_t g_tip_stale_check_time = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move these into PeerLogicValidation ?
Closing in favor of #11560 |
6262915 Add unit test for stale tip checking (Suhas Daftuar) 83df257 Add CConnmanTest to mutate g_connman in tests (João Barbosa) ac7b37c Connect to an extra outbound peer if our tip is stale (Suhas Daftuar) db32a65 Track tip update time and last new block announcement from each peer (Suhas Daftuar) 2d4327d net: Allow connecting to extra outbound peers (Suhas Daftuar) Pull request description: This is an alternative approach to #11534. Rather than disconnect an outbound peer when our tip looks stale, instead try to connect to an additional outbound peer. Periodically, check to see if we have more outbound peers than we target (ie if any extra peers are in use), and if so, disconnect the one that least recently announced a new block (breaking ties by choosing the newest peer that we connected to). Tree-SHA512: 8f19e910e0bb36867f81783e020af225f356451899adfc7ade1895d6d3bd5afe51c83759610dfd10c62090c4fe404efa0283b2f63fde0bd7da898a1aaa7fb281
6262915 Add unit test for stale tip checking (Suhas Daftuar) 83df257 Add CConnmanTest to mutate g_connman in tests (João Barbosa) ac7b37c Connect to an extra outbound peer if our tip is stale (Suhas Daftuar) db32a65 Track tip update time and last new block announcement from each peer (Suhas Daftuar) 2d4327d net: Allow connecting to extra outbound peers (Suhas Daftuar) Pull request description: This is an alternative approach to bitcoin#11534. Rather than disconnect an outbound peer when our tip looks stale, instead try to connect to an additional outbound peer. Periodically, check to see if we have more outbound peers than we target (ie if any extra peers are in use), and if so, disconnect the one that least recently announced a new block (breaking ties by choosing the newest peer that we connected to). Tree-SHA512: 8f19e910e0bb36867f81783e020af225f356451899adfc7ade1895d6d3bd5afe51c83759610dfd10c62090c4fe404efa0283b2f63fde0bd7da898a1aaa7fb281
6262915 Add unit test for stale tip checking (Suhas Daftuar) 83df257 Add CConnmanTest to mutate g_connman in tests (João Barbosa) ac7b37c Connect to an extra outbound peer if our tip is stale (Suhas Daftuar) db32a65 Track tip update time and last new block announcement from each peer (Suhas Daftuar) 2d4327d net: Allow connecting to extra outbound peers (Suhas Daftuar) Pull request description: This is an alternative approach to bitcoin#11534. Rather than disconnect an outbound peer when our tip looks stale, instead try to connect to an additional outbound peer. Periodically, check to see if we have more outbound peers than we target (ie if any extra peers are in use), and if so, disconnect the one that least recently announced a new block (breaking ties by choosing the newest peer that we connected to). Tree-SHA512: 8f19e910e0bb36867f81783e020af225f356451899adfc7ade1895d6d3bd5afe51c83759610dfd10c62090c4fe404efa0283b2f63fde0bd7da898a1aaa7fb281
6262915 Add unit test for stale tip checking (Suhas Daftuar) 83df257 Add CConnmanTest to mutate g_connman in tests (João Barbosa) ac7b37c Connect to an extra outbound peer if our tip is stale (Suhas Daftuar) db32a65 Track tip update time and last new block announcement from each peer (Suhas Daftuar) 2d4327d net: Allow connecting to extra outbound peers (Suhas Daftuar) Pull request description: This is an alternative approach to bitcoin#11534. Rather than disconnect an outbound peer when our tip looks stale, instead try to connect to an additional outbound peer. Periodically, check to see if we have more outbound peers than we target (ie if any extra peers are in use), and if so, disconnect the one that least recently announced a new block (breaking ties by choosing the newest peer that we connected to). Tree-SHA512: 8f19e910e0bb36867f81783e020af225f356451899adfc7ade1895d6d3bd5afe51c83759610dfd10c62090c4fe404efa0283b2f63fde0bd7da898a1aaa7fb281
This builds on #11490 and is related to the work there, but I've separated this for review in order to not hold up #11490 (which I think is done).
This pr attempts to implement a strategy suggested by @gmaxwell, of choosing a peer for outbound eviction if our tip has not been updated in a while. I'd like to suggest this for consideration in 0.15.0.2 as well, as it is designed to mitigate p2p disruption in the event that all our outbound peers stop relaying blocks to us.
@theuni If you have a chance to review, I could use feedback on the first new commit here (
Add hacky accessors for manipulating connman peers in tests
). I did the quickest thing I could in order to get a unit test working, but I'm pretty sure I violated all sorts of design goals, so I could use some guidance about the right way to add peers to connman in a unit test.