Skip to content

Conversation

morcos
Copy link
Contributor

@morcos morcos commented Dec 30, 2016

This is built off of #9375, #9252, and #9400. I'll properly rebase it when those are merged.

It provides special logic to issue a second getdata request for a cmpctblock if there is only 1 request outstanding and we think the announced block would be our new tip.

It also changes the cmpctblock processing logic to be first come first served (regardless of in flight block requests) and allow two simultaneous compact block reconstructions.

So in particular, it is now possible to:

  • receive headers -> request cmpctblock from peer 1
  • receive headers -> request cmpctblock from peer 2
  • receive cmpctblock -> request blocktxn from peer 3
  • receive cmpctblock -> request blocktxn from peer 4

Upon receiving a cmpctblock from peers 1 or 2 after this, it will be treated the same as receiving an unsolicited cmpctblock from peer 5 and it will attempt opportunistic reconstruction but not request blocktxn. It will also remove the block in flight for peer 1 or 2.

I believe that the multiple requests is an acceptable increase in bandwidth in order to provide robustness against a single peer stalling us at any point in the logic. The reason to allow a second request for a cmpctblock is to allow LB peers a chance to become HB even if there is a staller who is always announcing first, otherwise, only existing HB peers would have a chance to deliver the block.

@morcos
Copy link
Contributor Author

morcos commented Dec 30, 2016

Seems to fail sendheaders.py but not on my local machine... I'll look into it...


if (pindex->nStatus & BLOCK_HAVE_DATA) // Nothing to do here
return true;

LogPrintf("ChainWork check at height %d new: %s tip: %s\n",pindex->nHeight,pindex->nChainWork.GetHex(),chainActive.Tip()->nChainWork.GetHex());
if (pindex->nChainWork <= chainActive.Tip()->nChainWork || // We know something better
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not make this < instead of <= ? why do we avoid requesting compact blocks for competing best blocks?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@morcos I couldn't see this added line in any of the 3 PRs you mentioned.

@@ -1893,24 +2022,41 @@ bool static ProcessMessage(CNode* pfrom, string strCommand, CDataStream& vRecv,
fBlockReconstructed = true;
}
}
if (pindex->nHeight == 165) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the deal with block height 165?

map<uint256, pair<NodeId, list<QueuedBlock>::iterator> >::iterator it = mapBlocksInFlight.find(resp.blockhash);
if (it == mapBlocksInFlight.end() || !it->second.second->partialBlock ||
it->second.first != pfrom->GetId()) {
bool fExpectedBLOCKTXN = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very much approve of the variable namings chosen in various places.

// We seem to be rather well-synced, so it appears pfrom was the first to provide us
// with this block! Let's get them to announce using compact blocks in the future.
MaybeSetPeerAsAnnouncingHeaderAndIDs(nodestate, pfrom, connman);
if (nodestate->fSupportsDesiredCmpctVersion && vGetData.size() == 1 && mmapBlocksInFlight.size() == mmapBlocksInFlight.count(vGetData[0].hash) && pindexLast->pprev->IsValid(BLOCK_VALID_CHAIN)) {
Copy link
Contributor

@rebroad rebroad Dec 31, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make this <=2 rather than ==1 (for mapBlocksInFlight.size())? this way if two blocks get announced in close proximity we can request compact blocks for both of them (rather than compact block for the oldest, and full block for the most recent).

@rebroad
Copy link
Contributor

rebroad commented Dec 31, 2016

I like the coding style and elegance of this. Will help with testing.

@morcos
Copy link
Contributor Author

morcos commented Dec 31, 2016

@rebroad As mentioned in the PR comment, this is built off 3 other PR's. All of your comments either belong on those PR's or are related to the last commit which was just for debugging the travis failure and will be removed. I have just left it there for now in case anyone else wants to see the error details.

@TheBlueMatt
Copy link
Contributor

@morcos claimed on IRC the test failures might be (in part) due to the issue mentioned at #9375 (comment)

@rebroad
Copy link
Contributor

rebroad commented Jan 1, 2017

@morcos although most of the lines I am commenting on have not been introduced by you, given you are changing the code near to them, I tought it was a good opportunity to suggest these changes as I believe they ought to be changed at some point, so perhaps could be with this PR.

@morcos
Copy link
Contributor Author

morcos commented Jan 1, 2017

Rebased with the newest commit from #9375 which fixes the failure.

For clarity only the 5 commits on which I'm the author are meant for review here. The others are contained in the linked PR's...

@maflcko
Copy link
Member

maflcko commented Jan 1, 2017

OT: Imo it makes sense to octomerge all pulls which this pull depends on and rebase the commits of this pull on top of the merge commit. Thus, the original commit hashes are preserved and it is clear what was old and should be reviewed in other pulls. Also, it is easier to see the fresh commits.

Copy link
Member

@instagibbs instagibbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre-rebase utACK. I still think it might make sense to only allow a second if a HB peer is responsible for one of them, but I think that kind of change is desired is complementary on top of this.

if (fPeerWasFirstRequest) {
// We requested this block, but its far into the future, so our
// mempool will probably be useless - request the block normally
// Only allow a the first peer to request a full block
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Only ask for the full block from the first peer we requested from"?

state->nBlocksInFlightValidHeaders -= itInFlight->second.second->fValidatedHeaders;
if (state->nBlocksInFlightValidHeaders == 0 && itInFlight->second.second->fValidatedHeaders) {
// Last validated block on the queue was received.
nPeersWithValidatedDownloads--;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could just do the -= itInFlight->second.second->fValidatedHeaders as above to match.

BlockDownloadMap::iterator itInFlight = range.first;
ClearDownloadState(itInFlight);
range.first++;
mmapBlocksInFlight.erase(itInFlight);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since C++11 can also just capture the return value and set as range.first instead of worrying about iterator invalidation.

@@ -105,7 +105,8 @@ namespace {
bool fValidatedHeaders; //!< Whether this block has validated headers at the time of request.
std::unique_ptr<PartiallyDownloadedBlock> partialBlock; //!< Optional, used for CMPCTBLOCK downloads
};
map<uint256, pair<NodeId, list<QueuedBlock>::iterator> > mapBlocksInFlight;
typedef std::multimap<uint256, pair<NodeId, list<QueuedBlock>::iterator>> BlockDownloadMap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think indexing might by NodeId would be better than using a multimap, because it would ensure that there couldn't be multiple entries for a block from the same node. I'd suggest:

typedef map<pair<uint256, NodeId>, list<QueuedBlock>::iterator> BlockDownloadMap;

This also would let you replace some of the while loops added here with direct lookups. I thought some of these (especially the while loop with the break statement setting fExpectedBLOCKTXN) were kind of confusing.

The downside of using a map is that you wouldn't have an equal_range method to call in the places that do require a loop. But you could replace those equal_range calls with calls to an equivalent helper function:

pair<BlockDownloadMap::iterator, BlockDownloadMap::iterator>
GetBlockDownloadRange(BlockDownloadMap& blocks, const uint256& hash) {
    return {blocks.lower_bound({hash, numeric_limits<NodeId>::min()}),
            blocks.upper_bound({hash, numeric_limits<NodeId>::max()})};
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure about this change. In particular there are a lot of mmapBlocksInFlight.count(hash) calls that would get a bit less clear... I'll think about it some more

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I noticed that in the later commits. You could have a count helper function returning std::distance(range.first, range.second). I do think a map is a better way to represent the data, but the c++ map implementation does makes it a little awkward. Anyway, it's just something to consider.

@@ -2292,6 +2292,18 @@ bool static ProcessMessage(CNode* pfrom, string strCommand, CDataStream& vRecv,
}
pindexWalk = pindexWalk->pprev;
}
// Special case for second cmpctblock request of tip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you expand this comment a little bit to describe the condition being checked? In particular I don't understand how the IsWitnessEnabled and fHaveWitness parts relate to making the request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, maybe consider just pulling the MSG_CMPCT_BLOCK setting a bit later down up to here and just requesting the block in this if statement (possibly before the while loop above).

@@ -1947,7 +1947,7 @@ bool static ProcessMessage(CNode* pfrom, string strCommand, CDataStream& vRecv,

if (pindex->nChainWork <= chainActive.Tip()->nChainWork || // We know something better
pindex->nTx != 0) { // We had this block at some point, but pruned it
if (fAlreadyInFlight) {
if (fInFlightFromSamePeer) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change ("Only request full blocks from the peer we thought had the block in-flight") a change in behavior? Or is it just a cleanup after the previous multimap commit? It seems like this commit should be merged into the preceding or following one, or the commit message should be extended to say what the effect is, what motivates it.

@morcos
Copy link
Contributor Author

morcos commented Jan 6, 2017

Rebased and I think done the way @MarcoFalke suggested.

Addressed feedback

@morcos
Copy link
Contributor Author

morcos commented Jan 17, 2017

rebased

@da2ce7
Copy link

da2ce7 commented Feb 7, 2017

#9375, #9252, and #9400 are merged, needs rebase.

…ight

This is a change in behavior so that if for some reason we request a block from a peer, we don't allow an unsolicited CMPCT_BLOCK announcement for that same block to cause a request for a full block from the uninvited peer (as some type of request is already outstanding from the original peer)
@morcos
Copy link
Contributor Author

morcos commented Feb 14, 2017

rebased

@@ -274,6 +274,41 @@ void InitializeNode(CNode *pnode, CConnman& connman) {
PushNodeVersion(pnode, connman, GetTime());
}

// Requires cs_main
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you AssertLockHeld?

state->nStallingSince = 0;
}

// Requires cs_main.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you AssertLockHeld?

@@ -2292,6 +2292,18 @@ bool static ProcessMessage(CNode* pfrom, string strCommand, CDataStream& vRecv,
}
pindexWalk = pindexWalk->pprev;
}
// Special case for second cmpctblock request of tip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, maybe consider just pulling the MSG_CMPCT_BLOCK setting a bit later down up to here and just requesting the block in this if statement (possibly before the while loop above).

mmapBlocksInFlight.size() == mmapBlocksInFlight.count(pindexLast->GetBlockHash()) &&
mmapBlocksInFlight.count(pindexLast->GetBlockHash()) < MAX_CMPCTBLOCKS_INFLIGHT_PER_BLOCK &&
!(pindexLast->nStatus & BLOCK_HAVE_DATA) &&
(!IsWitnessEnabled(pindexLast->pprev, chainparams.GetConsensus()) || State(pfrom->GetId())->fHaveWitness) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the strange case that a peer does not set the fHaveWItness service bit, but does announce compact blocks v2, I believe this line would result in a full block request. More generally, because the two if statements always have to be in sync to avoid this, I really prefer we pull the actual request logic into this if statement.

@@ -2072,8 +2086,7 @@ bool static ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStr
// We want to be a bit conservative just to be extra careful about DoS
// possibilities in compact block processing...
if (pindex->nHeight <= chainActive.Height() + 2) {
if ((!fAlreadyInFlight && nodestate->nBlocksInFlight < MAX_BLOCKS_IN_TRANSIT_PER_PEER) ||
fInFlightFromSamePeer) {
if ((countPartialBlocksStarted < MAX_CMPCTBLOCKS_INFLIGHT_PER_BLOCK && nodestate->nBlocksInFlight < MAX_BLOCKS_IN_TRANSIT_PER_PEER)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I read this, the use of countPartialBlocksStarted, instead of a countBlocksStarted, means that we will request up to two compact blocks at a time, even if we are already requesting the full block from a peer. This seems strange to me, why not just max 2 in-flights at the same time for a given block, with the second never being a full block?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why drop the fInFlightFromSamePeer option? It looks like we'll never getblocktxn from two peers simultaneously?

!(pindexLast->nStatus & BLOCK_HAVE_DATA) &&
(!IsWitnessEnabled(pindexLast->pprev, chainparams.GetConsensus()) || State(pfrom->GetId())->fHaveWitness) &&
nodestate->fSupportsDesiredCmpctVersion) {
vToFetch.push_back(pindexLast);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further, simply adding the entry to vToFetch here may result in a null pointer dereference, I believe. If a peer announces two headers messages back-to-back, the first time we will MarkBlockAsInFlight to them, and the second time we'll hit this condition and add the entry to vToFetch again (which should also be fixed). Further down, we'll MarkBlockAsInFlight to them again, but MarkBlockAsInFlight requires that, if the block is already in-flight to the same peer, pit be something non-NULL as it will be dereferenced, but it is NULL in the call below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the above needs fixing in three ways - MarkBlockAsInFlight needs to be more robust against NULL pit, the request needs to move into this if statement and skip the remainder of this block of code, and we shouldn't double-request from the same peer.

@@ -2094,6 +2107,7 @@ bool static ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStr
return true;
} else if (status == READ_STATUS_FAILED) {
// Duplicate txindexes, the block is now in-flight, so just request it
// NOTE: This is the one place two full block requests can be outstanding
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so why not just check fPeerWasFirstRequest and MarkBlockAsNotInFlight otherwise?

@@ -9,6 +9,8 @@
#include "net.h"
#include "validationinterface.h"

/** Maximum number of outstanding CMPCTBLOCK requests for the same block. */
static const int MAX_CMPCTBLOCKS_INFLIGHT_PER_BLOCK = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be configurable? Might make sense for miners to have this higher.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely not, asking all your peers for a copy of the block poses the same network-DoS risks as connecting to hundreds of nodes, which people like to do because they believe it will help (though it usually actually hurts) them get their blocks out faster.

More importantly, if you're a miner and have good peers, I think it'd be somewhat rare for you to receive a third compact block announce before the first can respond to your blocktxn request, at least it will be once we get proper multi-threaded ProcessMessages implemented to respond to blocktxn requests for the latest block in the background (see #10652 for the beginnings of the steps to do so).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think the upper limit is before it would generally cause a negative impact? Maybe just have an upper limit like we do for max outbound connections.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably around 2 :p. Its really only useful if your peer got stuck doing something and wasn't able to respond or is being actively malicious. Once we've fixed the block-on-block-validation-before-responding-to-blocktxn-requests issue, it should be somewhat rare for this to help more than a very small amount.

@jtimon
Copy link
Contributor

jtimon commented Sep 6, 2017

More concurrency! concept ACK

@TheBlueMatt
Copy link
Contributor

This should likely be closed in favor of #10984.

@ryanofsky
Copy link
Contributor

Should this still be closed in favor of #10984?

@morcos morcos closed this Nov 9, 2017
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants