-
Notifications
You must be signed in to change notification settings - Fork 37.7k
p2p: Advance pindexLastCommonBlock early after connecting blocks #32180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32180. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsNo conflicts as of last run. |
bb5aa0a
to
205e651
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK.
src/net_processing.cpp
Outdated
// If our tip has advanced beyond pindexLastCommonBlock, move it ahead to the tip. We don't need to download any blocks in between, and skipping ahead here | ||
// allows us to determine nWindowEnd better. | ||
if (m_chainman.ActiveHeight() > state->pindexLastCommonBlock->nHeight && state->pindexBestKnownBlock->GetAncestor(m_chainman.ActiveHeight()) == m_chainman.ActiveTip()) { | ||
state->pindexLastCommonBlock = m_chainman.ActiveTip(); | ||
} | ||
|
||
// If the peer reorganized, our previous pindexLastCommonBlock may not be an ancestor | ||
// of its current tip anymore. Go back enough to fix that. | ||
state->pindexLastCommonBlock = LastCommonAncestor(state->pindexLastCommonBlock, state->pindexBestKnownBlock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could there be an issue if both nodes reorganize to the same chain, but the last common block is still on another fork?
If our new tip height is less than the outdated pindexLastCommonBlock
height (like when reorging to a shorter chain with more work), we'd skip the optimization when we shouldn't.
Since LastCommonAncestor()
brings back the pindexLastCommonBlock
on the peer's best chain, maybe the new code should come after this call?
Not sure if this edge case matters in practice, but thought I'd mention it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good observation!
I also think that this edge case shouldn't matter much.
It's absolutely essential that pindexLastCommonBlock
is an ancestor of pindexBestKnownBlock
when calling FindNextBlocks
, which is why I originally left that check last. But the added check can't change that since it only sets pindexLastCommonBlock
to the tip if that tip is an ancestor of pindexBestKnownBlock
.
So I've changed the order.
Concept ACK Thanks for this PR. I have made some notes regarding this PR and will add some guide for testing as well very soon. |
205e651
to
c69ee2d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK c69ee2d
The code changes look reasonable. I tested that p2p_ibd_stalling.py
fails without commit 1d8504c p2p: During block download, adjust pindexLastCommonBlock right away
. It failed 7 times and passed 1 time. Is it expected to pass sometimes?
I also checked that the code inside the modified if
s is executed during p2p_ibd_stalling.py
.
// If our tip has advanced beyond pindexLastCommonBlock, move it ahead to the tip. We don't need to download any blocks in between, and skipping ahead here | ||
// allows us to determine nWindowEnd better. | ||
if (m_chainman.ActiveHeight() > state->pindexLastCommonBlock->nHeight && state->pindexBestKnownBlock->GetAncestor(m_chainman.ActiveHeight()) == m_chainman.ActiveTip()) { | ||
state->pindexLastCommonBlock = m_chainman.ActiveTip(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change assumes their tip is higher than our tip. What if their tip is lower? Maybe worth advancing pindexLastCommonBlock
in that case as well. Something like this:
// If our tip has advanced beyond pindexLastCommonBlock, move it ahead to the tip. We don't need to
// download any blocks in between, and skipping ahead here allows us to determine nWindowEnd better.
const auto our_tip{m_chainman.ActiveTip()};
const auto peer_best_known{state->pindexBestKnownBlock};
if (our_tip->nHeight > state->pindexLastCommonBlock->nHeight) {
if (peer_best_known->nHeight >= our_tip->nHeight) {
if (peer_best_known->GetAncestor(our_tip->nHeight) == our_tip) {
state->pindexLastCommonBlock = our_tip;
}
} else {
if (our_tip->GetAncestor(peer_best_known->nHeight) == peer_best_known) {
state->pindexLastCommonBlock = peer_best_known;
}
// FindNextBlocks() would be a noop below, so return here.
return;
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we would have returned earlier because there are no blocks from this peer that would help us with our IBD:
bitcoin/src/net_processing.cpp
Lines 1384 to 1387 in 80e6ad9
if (state->pindexBestKnownBlock == nullptr || state->pindexBestKnownBlock->nChainWork < m_chainman.ActiveChain().Tip()->nChainWork || state->pindexBestKnownBlock->nChainWork < m_chainman.MinimumChainWork()) { | |
// This peer has nothing interesting. | |
return; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to advance pindexLastCommonBlock
in that case? I am just curious, it is definitely not the aim of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say probably not because pindexLastCommonBlock
isn't really used for anything but assigning blocks during IBD.
If we will never need any blocks from this peer (until it sends us more headers, but then pindexLastCommonBlock
will get updated. I'll also ignore weird invalidateblock
scenarios here), I don't see what benefits advancing it anyway would have.
test/functional/p2p_ibd_stalling.py
Outdated
stall_block = blocks[0].sha256 | ||
second_stall = blocks[500].sha256 # another block we don't provide immediately |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could use an array here:
stall_blocks = [blocks[0].sha256, blocks[500].sha256]
which would make it possible to have the following code like:
- peers.append(node.add_outbound_p2p_connection(P2PStaller([stall_block, second_stall]), p2p_idx=id, connection_type="outbound-full-relay"))
+ peers.append(node.add_outbound_p2p_connection(P2PStaller(stall_blocks), p2p_idx=id, connection_type="outbound-full-relay"))
- self.wait_until(lambda: sum(len(peer['inflight']) for peer in node.getpeerinfo()) == 2)
+ self.wait_until(lambda: sum(len(peer['inflight']) for peer in node.getpeerinfo()) == len(stall_blocks))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
self.log.info("Check that all outstanding blocks get connected") | ||
self.wait_until(lambda: node.getblockcount() == NUM_BLOCKS) | ||
self.log.info("Check that all outstanding blocks up to the second stall block get connected") | ||
self.wait_until(lambda: node.getblockcount() == 500) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could use a variable second_stall_block_index = 500
or something like that to avoid repeating the number 500
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to use one var for each index, plus one array.
c69ee2d
to
b7ff6a6
Compare
c69ee2d to b7ff6a6: addressed feedback
It wasn't - but I have tracked this down now. It's a bit involved, I hope the following explanation makes sense: The initial situation is that there are 2 remaining connected peers, the node has just synced blocks up to height 500, but pindexLastCommonBlock is still at 0. The second stall block at height 501 is already requested from one peer, let's say If SendMessages is invoked for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK b7ff6a6
But if peer0 is processed twice in a row (which can happen due to the randomization in ThreadMessageHandler), we could request block 1025 from it
I confirm this, made it reproducible:
git show 855b1fc2b9 |git apply -R
- observe
p2p_ibd_stalling.py
is failing (mostly). - apply the patch below to mess with the order in which nodes are processed
- observe
p2p_ibd_stalling.py
succeeding
[patch] change order in which nodes are processed
--- i/src/net.cpp
+++ w/src/net.cpp
@@ -3029,12 +3029,14 @@ void CConnman::OpenNetworkConnection(const CAddress& addrConnect, bool fCountFai
pnode->ConnectedThroughNetwork(),
GetNodeCount(ConnectionDirection::Out));
}
Mutex NetEventsInterface::g_msgproc_mutex;
+static size_t seq{0};
+
void CConnman::ThreadMessageHandler()
{
LOCK(NetEventsInterface::g_msgproc_mutex);
while (!flagInterruptMsgProc)
{
@@ -3043,13 +3045,22 @@ void CConnman::ThreadMessageHandler()
{
// Randomize the order in which we process messages from/to our peers.
// This prevents attacks in which an attacker exploits having multiple
// consecutive connections in the m_nodes list.
const NodesSnapshot snap{*this, /*shuffle=*/true};
- for (CNode* pnode : snap.Nodes()) {
+ auto nodes = snap.Nodes();
+ std::ranges::sort(nodes, [](CNode* a, CNode* b) {
+ return seq % 3 == 0 ? a->GetId() > b->GetId() : a->GetId() < b->GetId();
+ });
+
+ ++seq;
+
+ LogInfo("===== begin");
+ for (CNode* pnode : nodes) {
+ LogInfo("===== node id: %d, disconnect: %d", pnode->GetId(), pnode->fDisconnect);
if (pnode->fDisconnect)
continue;
// Receive messages
bool fMoreNodeWork = m_msgproc->ProcessMessages(pnode, flagInterruptMsgProc);
fMoreWork |= (fMoreNodeWork && !pnode->fPauseSend);
@@ -3058,12 +3069,13 @@ void CConnman::ThreadMessageHandler()
// Send messages
m_msgproc->SendMessages(pnode);
if (flagInterruptMsgProc)
return;
}
+ LogInfo("===== end");
}
WAIT_LOCK(mutexMsgProc, lock);
if (!fMoreWork) {
condMsgProc.wait_until(lock, std::chrono::steady_clock::now() + std::chrono::milliseconds(100), [this]() EXCLUSIVE_LOCKS_REQUIRED(mutexMsgProc) { return fMsgProcWake; });
}
This allows us to calculate nWndowEnd better. Before, it would be calculated based on an old pindexLastCommonBlock, which could result in blocks not requested for download that could have been requested, and peers being wrongly marked as staller.
This reverts commit 49d569c which introduced extra logic for when a snapshot was loaded. With the previous commit, this is not longer necessary because the more general logic of advancing pindexLastCommonBlock also covers this case.
getpeerinfo provides a list of blocks that are inflight, which can be used instead.
This test (which would fail without the previous commit) checks that after the stalling block was received, we don't incorrectly mark another peer as a staller immediately.
b7ff6a6
to
3bdff9a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 3bdff9a
@@ -1411,6 +1411,12 @@ void PeerManagerImpl::FindNextBlocksToDownload(const Peer& peer, unsigned int co | |||
if (state->pindexLastCommonBlock == state->pindexBestKnownBlock) | |||
return; | |||
|
|||
// If our tip has advanced beyond pindexLastCommonBlock, move it ahead to the tip. We don't need to download any blocks in between, and skipping ahead here | |||
// allows us to determine nWindowEnd better. | |||
if (m_chainman.ActiveHeight() > state->pindexLastCommonBlock->nHeight && state->pindexBestKnownBlock->GetAncestor(m_chainman.ActiveHeight()) == m_chainman.ActiveTip()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nano non-blocking nits:
Could extract tip = ActiveTip()
and use it everywhere here to make it slightly more readable.
Also, this isn’t introduced by this PR, but the second check has always puzzled me a bit. It would be nice to have some util function like IsBlockInChain(const CBlockIndex* block_to_test, const CBlockIndex* chain_tip)
.
ACK 3bdff9a I think it would be nice to include how the newly added logic would also cover the assumeutxo case in the commit description; if my understanding is correct, adding something like: After snapshot loading, our tip becomes the snapshot block. Since peers are verified to have the snapshot in their chain, our tip is an ancestor of the peer's best block, hence the general advancement logic will move pindexLastCommonBlock from any pre-snapshot position to the snapshot height automatically. |
Thanks! I'll address the nits if / when I have to push again. |
As described in #32179,
pindexLastCommonBlock
is updated later than necessaryin master.
In case of a linear chain with no forks, it can be moved forward at the beginning of
FindNextBlocksToDownload
, so that the updated value can be used to better estimatenWindowEnd
.This helps the node to request all blocks from peers within the correct 1024-block-window and avoids peers being incorrectly marked as stallers.
The second commit removes a special case introduced in 49d569c which is not necessary, because the more general logic also covers this case.
I also changed
p2p_ibd_stalling.py
to cover the situation after a resolved situation, making sure that no additional peers are marked for stalling.Fixes #32179