-
Notifications
You must be signed in to change notification settings - Fork 37.7k
rpc: add 'getnetmsgstats', new rpc to view network message statistics #27534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
04e183d
to
210cf10
Compare
/** Move all messages from the received queue to the processing queue. | ||
* Also update the global map of network message statistics. | ||
*/ | ||
void MarkReceivedMsgsForProcessing(NetStats& net_stats) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to pass a reference of NetStats to every CNode upon initialization?
Example: cd0c8ee
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to this review comment: #27257 (comment) - It looks like it would be preferable for CNode to have net_stats as a member variable (I could be wrong) but that might mean each CNode would need to have a shared_ptr to a NetStats instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great point. I think I need to take a closer look at some of the refactoring that's going on in net
in hopes of finding other examples/decisions that support making NetStats
a member variable in CNode
. I think the comment you linked to is very relevant.
ConnectedThroughNetwork(), | ||
m_conn_type, | ||
msg.m_type, | ||
msg.m_raw_message_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @dergoegge
@@ -49,6 +49,8 @@ const char *WTXIDRELAY="wtxidrelay"; | |||
const char *SENDTXRCNCL="sendtxrcncl"; | |||
} // namespace NetMsgType | |||
|
|||
const std::string NET_MESSAGE_TYPE_OTHER = "*other*"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was moved over from net.cpp. Should it be in the NetMsgType namespace? If so, what about the allNetMessageTypes array? There are places in this PR where I have to add +1 to account for the missing 'other' message type, and I'm not sure if it's appropriate for this to be a special case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding it to the NetMsgType
namespace makes sense to me- the current name essentially manually namespaces it.
RE allNetMessageTypes
array - either way seems OK to me. from a quick glance at callers, I don't think you'd have to add much special case logic if you added it in the array, so that could be nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few initial thoughts, mainly about code organization.
src/net.h
Outdated
* Placeholder for total network traffic. Split by direction, network, connection | ||
* type and message type (byte and message counts). | ||
*/ | ||
class NetStats |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be a good idea for this class to be in its own file/compilation unit, rather than added into net.h
(which is included in 40+ other files)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 I think having this code in a separate compilation unit + forward declaration of the class would be sufficient for net.h
, then net.cpp
could include the full file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! Will work on this now (leaving this reply in case this comment disappears once I rebase)
src/net.h
Outdated
|
||
static Direction DirectionFromIndex(size_t index); | ||
static Network NetworkFromIndex(size_t index); | ||
static ConnectionType ConnectionTypeFromIndex(size_t index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if these to/from util helper methods are needed or if this is the right place for them rather than in their respective domains where available, like node/connection_type
and node/network
(the latter proposed in #27385).
Unrelated, can also make the declaration of new methods that return a value [[nodiscard]]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the questioning of where these util methods belong.
I didn't know about [[nodiscard]]
. Cool feature! Added it to the public *(to/from)Index
and private *ToIndex
methods by this comment. New commit after update + rebase: fc86267
I may end up adding it to more stuff as I work through the code, but did not do a full sweep of additional areas where it could be applied.
@@ -509,6 +511,396 @@ static RPCHelpMan getaddednodeinfo() | |||
}; | |||
} | |||
|
|||
namespace net_stats { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps consider putting all this new code in its own rpc/netstats
files.
@@ -76,6 +77,9 @@ enum class ConnectionType { | |||
ADDR_FETCH, | |||
}; | |||
|
|||
/** Number of entries in ConnectionType. */ | |||
static constexpr size_t NUM_CONNECTION_TYPES{6}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps better to calculate this programmatically, so it doesn't need to be updated, as there are multiple pulls IIRC that propose to add more such types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with the idea of programmatic calculation. I noticed NET_MAX in netaddress.h
accomplishes this by adding an enum value to the end of the list. That new value serves as a counter for all the enums. I initially implemented these (number of connection and message types) in that same manner because it seemed more programmatic. As long as new enums were inserted before, then I figured they would be accounted for in the total.
However, l Iater got feedback and the unreliability of mapping enums to integers was brought to my attention. Every single enum needs to be explicitly mapped to an integer, and if you're going to go through the trouble of that, then it comes out to be basically the same amount of work to make a standalone constant. There's also something that doesn't feel consistent about having an enum that accounts for the total living alongside the enum values that are being counted.
In this PR I have a static assert
checking that NUM_NET_MESSAGE_TYPES
is what we expect it to be. Would love if there was somehow a way to do that with the number of connection types as well. It's not exactly setting the value in a programmatic way, but it's at least providing some kind of check.
From src/protocol.cpp
:
static_assert(NUM_NET_MESSAGE_TYPES == sizeof(allNetMessageTypes) / sizeof(allNetMessageTypes[0]), "Please update NUM_NET_MESSAGE_TYPES");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@satsie is the concern of having a dummy value like NET_MAX
that it could fall out of sync if enum values start being explicitly assigned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amitiuttarwar sort of. It's two things:
-
If enum values are not explicitly assigned (which is the case for the
Network
enum right now), there is no guarantee thatNET_MAX
is what you think it will be. This came out of a conversation with @vasild during review earlier this year. It's my understanding thatNET_MAX
probably corresponds to the total number of enum values, but even today there are no guarantees on the value ofNET_MAX
. I'd like to do a little more research here and get some clear documentation on this though, as I'm currently just going off of information that was given to me. -
If you start explicitly assigning integers to each enum value, you still have to update
NET_MAX
, so I don't think it is in that much danger of falling out of sync, but it is the same amount of work as updating a constant likeNUM_CONNECTION_TYPES
here.
c0014f5
to
9269b74
Compare
Thank you for taking a look @jonatack! 🙏 All your comments on code organization are fair game and something I've wondered myself. Because I have absolutely zero point of reference for any of this, I'm going to to leave your comments about organization open with the hopes of getting more feedback. |
Concept tACK - left a few very minor comments, happy to keep testing and reviewing in more depth |
9269b74
to
1fc3504
Compare
Move NET_MESSAGE_TYPE_OTHER to protocol.h/cpp, where the rest of the message types live.
Add consts to keep track of the number of connection and message types. The number of message types should match the number of elements in the allNetMessageTypes array. Co-authored-by: Vasil Dimov <vd@FreeBSD.org>
1fc3504
to
3e0a98c
Compare
Thank you for testing and taking a peek at this @ccdle12 🤗 Appreciate the review and will definitely be taking you up on the offer for a more in depth review as this evolves! 😉 EDIT: also wanted to mention I rebased. I'm guessing DrahtBot is going to remove the label? 👀 |
faaa417
to
e6e51e1
Compare
New data structure to hold the atomic message and byte counts of each network message statistic at the most granular level. Uses a 4D array that indexes by direction, network type, connection type, and message type. Co-authored-by: Vasil Dimov <vd@FreeBSD.org>
Add to/fromIndex() methods to safely map enums to indexes. Co-authored-by: Vasil Dimov <vd@FreeBSD.org>
In addition to per-peer stats, start recording stats (the number of messages and bytes) globally in `CConnman`. This happens every time messages are sent or received. The per-peer stats are lost when a peer disconnects. Also expose this object with a getter.
Aggregate stats to show zero or more of the following dimensions: direction, network, connection, message type The depth of the return object depends the number of dimensions specified to be shown. MultiLevelStatsMap makes this possible, and converts the object to a UniValue. Co-authored-by: Anthony Towns <aj@erisian.com.au> Co-authored-by: Vasil Dimov <vd@FreeBSD.org>
Introduce a new getnetmsgstats rpc that is able to return network message statistics arranged in a number of different ways (direction, network, connection, message type). If no dimension types are specified to be shown, return the totals. Includes helper code to convert a string to a DimensionType enum. Co-authored-by: Vasil Dimov <vd@FreeBSD.org>
e6e51e1
to
1b9fe3b
Compare
Moving to draft as I work through some feedback |
🐙 This pull request conflicts with the target branch and needs rebase. |
Is there a use case for this, for the typical user? If not, maybe it should be optional and disabled by default? |
@luke-jr, for sure there will be people that don't need this. But it is just an extra RPC, if somebody does not need it, then he/she will not call it. Similar stats are already provided in the |
Are you still working on this? |
@willcl-ark is picking this up and will open a follow up PR. Closing |
@willcl-ark, thank you! Here is a branch with some suggestions and ideas you may find useful: https://github.com/vasild/bitcoin/tree/getnetmsgstats |
Picked up in #28926. |
Introduce a new RPC,
getnetmsgstats
to retrieve network message statistics. This work addresses #26337. More information on the RPC design and implementation can be found in that issue.Massive thank-you to amitiuttarwar, vasild, and ajtowns for their help on this 🙏 Over the course of several months, they have patiently provided a tremendous amount of guidance and assistance in more ways than I can count!
getnetmsgstats RPC
Returns the message count and total number of bytes for sent and received network traffic. Results may optionally be arranged by direction, network, connection type and/or message type.
Arguments
show_only: an optional array of one or more dimensions to show as part of the results. Valid options are: direction, network, message_type, and connection_type. If no dimensions are specified, totals are returned.
Examples
Below are some examples on how the new
getnetmsgstats
RPC can be used.getnetmsgstats
getnetmsgstats '["conntype","msgtype"]'
getnetmsgstats '["network", "direction", "connection_type", "message_type"]'
Things to consider
RPC behavior: Should we allow dimensions to be rearraged?
When it comes time to gather up the RPC results, @vasild has provided an alternate implementation that uses an array instead of the
MultiMap
structure. This results in two changes:Aside from being good for memory, the reasoning here is allowing users to rearrange dimensions may be too confusing. I personally really like the ability to rearrange dimensions, which is why I have not integrated that solution into this PR, but am open to whatever the larger community prefers :)
Locking & Consistency Beyond Individual Values
With atomics, we know we can rely on the values for individual stats like bye count and message count. However, the byte count and message count may not always match. For example, let’s say we currently have 5 messages, and 100 bytes:
The read operation did not return accurate data! Unlike the torn write example for a single value, It returned data that was true for some point in time, it’s just that the values for message count and byte count were inconsistent.
To solve this, it’s actually not enough to lock the
MsgStat
struct. It's my understanding that you need a mutex to protect the entireNetStats
data structure.The trade off here isn’t attractive. A lock would halt network threads that are doing important stuff, all for the sake of consistent stats. Even for reads, we would have to stop writes. We’d end up stopping threads that are doing important things for something that is not that important.
Another thing to consider is how often will this RPC be called? If it’s once in a blue moon, then such a mutex is probably fine. But for a live dashboard that is making a call every second, this is not acceptable.
Making Enum Indices Safe and Reliable
src/net.cpp
contains a bunch of*To/FromIndex()
methods that convert an enum to an index number. I’ve decided to be explicit about these conversions because enums are not guaranteed to start at zero and it’s not enough to simply associate the first value in an enum with a number.To protect against potential gaps or overlaps in numbering, every single enum value must be assigned an index. This is the only way to guarantee that the numbering is safe and reliable.
Instead of updating the existing
Network
andConnectionType
enums to assign indices to each enum value, and risk unintentionally modifying the behavior of code that uses these enums, I’ve opted for the conversion methods. This also narrows the scope of the conversion methods, making changed behavior easier to spot if the indices are modified.I’m interested in feedback on this. The
*To/FromIndex()
methods have low review and maintenance cost, but I’m unsure if I’ve correctly evaluated the risks associated with updating theNetwork
andConnectionType
enums . Also open to discussion on if there is a better place for these conversion methods to live.