-
Notifications
You must be signed in to change notification settings - Fork 37.7k
p2p: always provide CNodeStateStats and getpeerinfo/netinfo/gui updates #25923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
30b2ea7
to
7ee6f76
Compare
Can you explain why this is a "bugfix", considering that the API break was done on purpose at that time? |
Concept ACK @MarcoFalke, a bug is a bug, even if it was done on purpose. @jnewbery can clarify here, but I do not think this was even purposeful. My guess is that this breakage was unwanted, but acceptable in order to get the refactor from #21160 done. It later required fixup attempts like #24691 and #25176. So, what happened? The #24691 documented the fields as optional, but that is not appropriate because in the previous version they were not-optional. When users wrote their apps they assumed the fields will always be present. I think the "optional" flag should only be added when a field is initially introduced. Adding it later is the same as dropping the field - it breaks the "contract" with the user that the field will be present. On this PR: I think it is ok to display the values |
This isn't true. None of the fields in PR 22798 was opened after #21160, then merged, and resulted in the silent conflict where the fields were not marked as optional. That could be caught by better testing of optional fields.
Stop misrepresenting other people's words. This was in response to a review comment about the GUI, not the RPC. You've also clipped the quote to misrepresent it further. Here's the full sentence: "will almost always be true, so this isn't a big issue (if it was, then the other stats in nodeStateStats not being available would also be an issue)." I stand by that - if it's a problem that one of the fields is not visible in the GUI for a very short window during peer connection, then it'd be a problem that the other fields like "startingheight" or "minping" are not present. It's clearly not a problem, since that's always been the case. At the time of the comment, it was also totally consistent with the other fields that were optional but not marked as such, since #22798 had not yet been opened.
Again, this is false. The 'optional' documentation was added in #22798, which was only included in v0.23. Until then, there were many fields could be considered optional but not documented as such. Users writing apps that interacted with the field (including the bitcoin-cli netinfo app) should have made those apps tolerant to the fields not being present. As can be seen in #22798, there were 10s of fields across many different RPCs that were 'optional' before being explicitly documented as such. Both the PR description and @vasild's comment make it sound like this was some nefarious intentional API break, or some egregious bug. It's neither. Frankly, it's utterly tedious to have to spend time rebutting false statements and misrepresentations made about PRs that were merged months ago. Unfortunately, this 'gotcha' attitude of trying to shame code authors seems to be becoming more common in this project. It's extremely harmful for productive collaboration, and is certainly not something I'd expect to see from someone putting themselves forward as a potential project maintainer. |
My goodness, that's saddening to read. The only goal should be to improve Bitcoin. Let's please remember to assume good intentions, focus on the code and improving Bitcoin and leave the other stuff behind. Thank you! |
Alright, by "documented" I meant that since it is not documented as optional, then it must be mandatory, but I was not aware that optional fields were introduced later (#22798). Thanks for bringing it up. This changes the perspective here.
In the RPC case you referred to the GUI comment: As above for the gui peer console. Anyway, I was just trying to clarify the situation here, not misinterpret, gotcha or shame. I am having coronavirus and should probably not be at the computer today at all, upsetting people. Will be more careful with my wording in the future. Thank you! |
Concept ACK |
(Am considering closing this draft and opening a new one. If there are any helpful, friendly suggestions in the interim to improve the change, that would be great |
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm not convinced either way about what the right behaviour is here -- if a connection is still being setup, it seems more useful to indicate that via getpeerinfo
or -netinfo
than just treating it as if it will be like most connections are. On the other hand, that's a bunch of effort for unusual special cases, so 🤷 ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While there are some fields that can obviously be made non-optional (such as the fee filter), because they can be absent on the wire even long after the connection has been spun up (thus require a default value anyway), there are other fields where I think it would be problematic to make them non-optional. For example, the service flags can not change after the handshake, so returning different service flags for the same peer id in different RPC invocations might cause issues in callers that assume they can't change. I see that you didn't change service flags in this pull, but the same should hold for txrelay, unless the user has bloom filters enabled.
So overall I don't think this can be "sold" as a bugfix of some kind, but instead should be done by evaluating each field on a case-by-case basis whether it makes more sense to be optional or not. The behaviour change should then be motivated based on that rationale and not to restore some underdocumented/misdocumented interface of a past release.
stats.m_starting_height = peer->m_starting_height; | ||
const PeerRef peer = GetPeerRef(nodeid); | ||
if (peer == nullptr) { | ||
return stats; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK to returning partial data rather than none at all.
7ee6f76
to
cdad951
Compare
Rebased and updated to take review feedback by @ajtowns and @luke-jr and for the merge of #25717, maintaining the tristate Thanks for the input, everyone. I'm agnostic to how we describe this change, as can be seen in the PR title and description history. I've re-updated both. I agree with evaluating each field on a case-by-case basis and it's what I've tried to do. History:
Before adding the "optional" mentions to the RPC help docs, I believe we would often mention that fields were optional in the description, see 5ba829e for an example. IIUC the only behavior change here where there may be varying opinions is to keep My thinking is that this is the 8th year it has always been present, e.g. in a majority of our releases (~15) until now. That long-established behavior is a contract I would prefer to maintain, absent a compelling reason not to do so, in which case it might be good to warn our user space about the change in a release note. It is true that both (a) the Regarding these last two points, I'll leave it to reviewers to weigh in on which way to go. |
You're linking to documentation that you added to a single RPC. By comparison look at #22798, which adds over 120 optional type arguments to existing rpcs. A large majority of "optional" type arguments were not documented as such until September 2021/v0.23. The optional type itself was not added until #17809 in 2020, many years after these return fields were included in the RPC (and not documented as optional).
No. If it's not documented, it's not a contract. Anyone writing clients using the RPC interface would have needed to make the clients accepting of fields being missing, since that's always been possible for a large number of the results fields in a large number of the RPCs. I agree with Marco that it makes sense for some fields to be non-optional, so some of the changes here some beneficial, but please stick to the facts in describing this change. |
Thanks @jnewbery. I'm agnostic to what word is used to describe the 8 years of |
-BEGIN VERIFY SCRIPT- rename() { sed -i "s:\<$1\>:$2:g" $(git grep -l "\<$1\>" ./src ./test); } rename nSyncHeight m_sync_height rename nCommonHeight m_common_height rename vHeightInFlight m_height_in_flight rename presync_height m_presync_height rename their_services m_services -END VERIFY SCRIPT-
as it can only be used in a couple of instances rather than in all of them, thereby providing little benefit and increasing confusion
b5255ca
to
25fa0d5
Compare
Rebased and WIP push (likely still need to add a mutex for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach ACK 25fa0d5
Edit, just not to forget:
likely still need to add a mutex for
m_starting_height
or std::atomic<std::optional<int>> m_starting_height
Also consider: #25923 (comment)
std::optional<int> m_starting_height; | ||
std::chrono::microseconds m_ping_wait{0us}; | ||
std::vector<int> m_height_in_flight; | ||
std::optional<bool> m_relay_txs{false}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will initialize the m_relays_txs
so that it has a value. Given that we never assign an empty optional to it, this means that calls to m_relay_txs.has_value()
will always be true and thus we don't need std::optional
for it. Should it be something like this:
diff --git i/src/net_processing.h w/src/net_processing.h
index 2d7068f4c8..55ff27c546 100644
--- i/src/net_processing.h
+++ w/src/net_processing.h
@@ -28,13 +28,13 @@ static const int DISCOURAGEMENT_THRESHOLD{100};
struct CNodeStateStats {
std::optional<int> m_sync_height;
std::optional<int> m_common_height;
std::optional<int> m_starting_height;
std::chrono::microseconds m_ping_wait{0us};
std::vector<int> m_height_in_flight;
- std::optional<bool> m_relay_txs{false};
+ bool m_relay_txs{false};
CAmount m_fee_filter_received{0};
uint64_t m_addr_processed = 0;
uint64_t m_addr_rate_limited = 0;
bool m_addr_relay_enabled{false};
std::optional<ServiceFlags> m_services;
std::optional<int64_t> m_presync_height;
diff --git i/src/qt/rpcconsole.cpp w/src/qt/rpcconsole.cpp
index 49402b0264..270076a780 100644
--- i/src/qt/rpcconsole.cpp
+++ w/src/qt/rpcconsole.cpp
@@ -1198,12 +1198,15 @@ void RPCConsole::updateDetailWidget()
ui->peerCommonHeight->setText(stats->nodeStateStats.m_common_height.has_value() ? QString("%1").arg(stats->nodeStateStats.m_common_height.value()) : ts.unknown);
ui->peerHeight->setText(stats->nodeStateStats.m_starting_height.has_value() ? QString::number(stats->nodeStateStats.m_starting_height.value()) : ts.unknown);
ui->peerPingWait->setText(GUIUtil::formatPingTime(stats->nodeStateStats.m_ping_wait));
ui->peerAddrRelayEnabled->setText(stats->nodeStateStats.m_addr_relay_enabled ? ts.yes : ts.no);
ui->peerAddrProcessed->setText(QString::number(stats->nodeStateStats.m_addr_processed));
ui->peerAddrRateLimited->setText(QString::number(stats->nodeStateStats.m_addr_rate_limited));
+ // if m_relay_txs is an optional that always has value, then this is equivalent to
+ // m_relay_txs.has_value() ? ts.yes : ts.no
+ // which will _always_ print ts.yes
ui->peerRelayTxes->setText(stats->nodeStateStats.m_relay_txs ? ts.yes : ts.no);
ui->peersTabRightPanel->show();
}
void RPCConsole::resizeEvent(QResizeEvent *event)
{
diff --git i/src/rpc/net.cpp w/src/rpc/net.cpp
index c688bd658f..9f76f5e41d 100644
--- i/src/rpc/net.cpp
+++ w/src/rpc/net.cpp
@@ -196,15 +196,13 @@ static RPCHelpMan getpeerinfo()
if (stats.m_mapped_as != 0) {
obj.pushKV("mapped_as", uint64_t(stats.m_mapped_as));
}
const ServiceFlags services{statestats.m_services.has_value() ? statestats.m_services.value() : ServiceFlags::NODE_NONE};
obj.pushKV("services", strprintf("%016x", services));
obj.pushKV("servicesnames", GetServicesNames(services));
- if (statestats.m_relay_txs.has_value()) {
- obj.pushKV("relaytxes", statestats.m_relay_txs.value());
- }
+ obj.pushKV("relaytxes", statestats.m_relay_txs);
obj.pushKV("lastsend", count_seconds(stats.m_last_send));
obj.pushKV("lastrecv", count_seconds(stats.m_last_recv));
obj.pushKV("last_transaction", count_seconds(stats.m_last_tx_time));
obj.pushKV("last_block", count_seconds(stats.m_last_block_time));
obj.pushKV("bytessent", stats.nSendBytes);
obj.pushKV("bytesrecv", stats.nRecvBytes);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @vasild! That was an oversight in my last WIP push, will update.
LogPrint(BCLog::NET, "more getheaders (%d) to end to peer=%d (startheight:%d)\n", | ||
pindexLast->nHeight, pfrom.GetId(), peer.m_starting_height); | ||
pindexLast->nHeight, pfrom.GetId(), peer.m_starting_height.value()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If m_starting_height
has no value the call to value()
will throw an exception. I do not feel comfortable either assuming that it will always have value here (also considering future changes) or that throwing an exception is ok. Some of this code is surrounded by try/catch
higher up, but we may move pieces of code around, possibly moving this printout elsewhere. Maybe do it more defensively:
LogPrint(BCLog::NET, "more getheaders (%d) to end to peer=%d%s\n",
pindexLast->nHeight,
pfrom.GetId(),
peer.m_starting_height.has_value() ? strprintf(" (startheight:%d)", peer.m_starting_height.value()) : "");
(here and in the other printouts)
std::get<1>(node_stats) = | ||
m_context->peerman->GetNodeStateStats(std::get<0>(node_stats).nodeid, std::get<2>(node_stats)); | ||
} | ||
TRY_LOCK(::cs_main, lockMain); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not finished with my review yet, but I think the changes to the GUI have a drawback:
TRY_LOCK
is used here in transmitting the info to the gui, meaning that we'll abort (and now return a default-initialized CNodeStateStats) if cs_main
is not available.
As a result of this, when I run the GUI with this branch and open the Peers window the peer info sometimes changes to the default-initialized values (it's as if the screen "flickers"), and with the next refresh it's back to the old (and correct) values.
If the node does something that's locking-intensive like IBD, the flickering becomes a lot more frequent and annoying.
I think it might be better not to update the GUI fields if we can't get the lock, instead of returning a default-initialised object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it not the same in master
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in master, we return fNodeStateStatsAvailable==false
and don't update the fields if we can't get the lock (code) instead of updating them with default-values. fNodeStateStatsAvailable
is being removed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, reproduced the issue you describe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some observations re: the RPC changes (I think the reporting to the GUI should be discussed separately, see my comment above):
-
As far as I can see, before this PR, the only situation in which
CNodeStateStats
was not available (fStateStats==false
) is a race condition during peer disconnection, becausegetpeerinfo
does not fetchCNodeStats
andCNodeStateStats
atomically - the node may have removed the peer in between the two calls. But this should not be possible during peer initialization because of the order in which the objectsCNode
,Peer
andCNodeState
are initialised. -
I think that this behavior during disconnection is unintended, a bug that should be fixed. One way to fix it would be to not report an incomplete peer info in this case (#26515), another possibility might be to prevent the node from disconnecting a peer in between these two calls, i.e. change the way
getpeerinfo
fetches data - which might also fix the related issue that data fromCNodeStateStats
might be always present but change after fetchingCNodeStats
data, leading to an inconsistent response. -
As a result of this, there currently seems to be no legitimate reason why we should fail to fetch an entire
CNodeStateStats
object when we have aCNodeStats
object, or why fields fromCNodeStateStats
need to be any more optional than field fromCNodeStats
. During the lifetime of a connection, all three internal objectsCNode
,Peer
andCNodeState
should always be available. -
It is an unrelated issue that some of the
getpeerinfo
fields (whichever internal object they may belong to) are not known during peer initialization because our peer hasn't given us enough info yet - the old strategy has been to return default-initialized values, such as-1
, but include the field. This PR makes many of these fields optional, by leaving them out of thegetpeerinfo
response - so it changesgetpeerinfo
behavior during startup. This should become very clear when this PR gets rebased wrt master and the newly added test in #26519 needs to be adjusted. -
I don't have a strong opinion on whether this is desirable (are there any general recommendation for this, or examples from other RPCs?), however note that some fields (e.g.
presynced_headers=-1
) are still left default-initialized, while others aren't. I also think that if we make the fields truly optional, it might be good to mention in the RPC help in which situations they are not reported.
6fefd49 rpc: Require NodeStateStats object in getpeerinfo (Martin Zumsande) Pull request description: The objects `CNode`, `CNodeState` and `Peer` store different info about a peer - `InitializeNode()` and `FinalizeNode()` make sure that for the duration of a connection, we should always have one of each for a peer. Therefore, there is no situation in which, as part of getpeerinfo RPC, `GetNodeStateStats()` (which requires a `CNodeState` and a `Peer` entry for a `NodeId` to succeed) could fail for a legitimate reason while the peer is connected - this can only happen if there is a race condition between peer disconnection and the `getpeerinfo` processing (see also a more detailed description of this in #26457 (review)). But in this case I think it's better to just not include the newly disconnected peer in the response instead of returning just parts of its data. An earlier version of this PR also made the affected `CNodeStateStats` fields non-optional (see mzumsande@5f900e2). Since this conflicts with #25923 and should be a separate discussion, I removed that commit from this PR. ACKs for top commit: dergoegge: Approach ACK 6fefd49 MarcoFalke: review ACK 6fefd49 👒 Tree-SHA512: 89c8f7318df4634c1630415de9c8350e6dc2d14d9d07e039e5b180c51bfd3ee2ce99eeac4f9f858af7de846f7a6b48fcae96ebac08495b30e431a5d2d4660532
🐙 This pull request conflicts with the target branch and needs rebase. |
…ateStats 6fefd49 rpc: Require NodeStateStats object in getpeerinfo (Martin Zumsande) Pull request description: The objects `CNode`, `CNodeState` and `Peer` store different info about a peer - `InitializeNode()` and `FinalizeNode()` make sure that for the duration of a connection, we should always have one of each for a peer. Therefore, there is no situation in which, as part of getpeerinfo RPC, `GetNodeStateStats()` (which requires a `CNodeState` and a `Peer` entry for a `NodeId` to succeed) could fail for a legitimate reason while the peer is connected - this can only happen if there is a race condition between peer disconnection and the `getpeerinfo` processing (see also a more detailed description of this in bitcoin#26457 (review)). But in this case I think it's better to just not include the newly disconnected peer in the response instead of returning just parts of its data. An earlier version of this PR also made the affected `CNodeStateStats` fields non-optional (see mzumsande@5f900e2). Since this conflicts with bitcoin#25923 and should be a separate discussion, I removed that commit from this PR. ACKs for top commit: dergoegge: Approach ACK 6fefd49 MarcoFalke: review ACK 6fefd49 👒 Tree-SHA512: 89c8f7318df4634c1630415de9c8350e6dc2d14d9d07e039e5b180c51bfd3ee2ce99eeac4f9f858af7de846f7a6b48fcae96ebac08495b30e431a5d2d4660532
What is the status of this given recently merged changes, and the commentary above? |
I'm reviewing other pulls ATM but will evaluate the scope here, starting from simply updating the statestats struct with std::optionals on up to the other changes. |
Ok. Moved to draft for now, given it's currently a work in progress. Feel free to undraft when you've decided on a scope, rebased, answered outstanding queries etc. |
Closing temporarily so that I can re-open it -- am still interested to finish this. |
update CNodeStateStats members to be std::optional or member-initialized with default values
return CNodeStateStats and NodeStats directly rather than a bool
make some getpeerinfo and gui peers fields always provided rather than optional: addr_processed, addr_rate_limited, addr_relay_enabled, inflight
remove NodesStats type which causes more confusion than benefit