-
Notifications
You must be signed in to change notification settings - Fork 37.8k
net, pcp: handle multi-part responses and filter for default route while querying default gateway #32159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32159. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. |
cc @laanwj This patch maintains the FreeBSD-style querying/filtering we have currently, but but increases the size of the response processed to a maximum of 1MB. |
Concept ACK
Edit: removed, the milestone, i still think it'd be nice to have in next 29.x release, but it turns out more complicated and riskier than expected, better to have no immediate time pressure |
std::memcpy(&gw, RTA_DATA(rta_gateway), sizeof(gw)); | ||
return CNetAddr(gw, scope_id); | ||
for (nlmsghdr* hdr = (nlmsghdr*)response; NLMSG_OK(hdr, recv_result); hdr = NLMSG_NEXT(hdr, recv_result)) { | ||
if (hdr->nlmsg_type == NLMSG_DONE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it guaranteed that the reponse to NLM_F_DUMP
is always multipart? Or do we need to check nlmsg_flags
for NLM_F_MULTI
, and if not, break after the first packet?
This is not clear to me from the documentation:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is, but like you am unsure about what the guarantees of the protocol are here.
I kind of reverse engineered this looking at miniupnpc and netlink sources, along with an strace
of ip route show
, which is where the repeated recv
calls jumped out to me as a difference between our code and other tools.
This approach simply relies on receiving an NLMSG_DONE
to signal the end of the response and break. This should handle both single and multi-part messages. Here's a snipped sample of strace -e filter=recvmsg ip route show
on my system:
Details
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 2932
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=1444, nlmsg_type=RTM_NEWLINK, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1743240135, nlmsg_pid=3248093}, <snip> 0) = 2932
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 3384
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=1504, nlmsg_type=RTM_NEWLINK, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1743240135, nlmsg_pid=3248093}, <snip> 0) = 3384
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 5196
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=1488, nlmsg_type=RTM_NEWLINK, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1743240135, nlmsg_pid=3248093}, <snip> 0) = 5196
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 20
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1743240135, nlmsg_pid=3248093}, 0], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 424
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=76, nlmsg_type=RTM_NEWADDR, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1743240136, nlmsg_pid=3248093}, <snip> 0) = 424
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 600
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=72, nlmsg_type=RTM_NEWADDR, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1743240136, nlmsg_pid=3248093}, <snip> 0) = 600
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 20
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1743240136, nlmsg_pid=3248093}, 0], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20
NLM_F_MULTI
is indeed set, even in the NLMSG_DONE
packet.
One other thing I did read about, but not implement, is checking of the sequence number on each message to ensure it was meant for our request. But as we only make a single request i thought this should be OK to omit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
The receive flow here seems to indicate:
- Receive packet
- Process packet
- If
NLM_F_MULTI
is set and the packet is notNLMSG_DONE
, repeat
What i'm mostly worried about is that the current code will hang if NLMSG_DONE
never comes, which seems to be the case for non-multi responses, which have one data packet.
But it may be that the NETLINK_ROUTE
response to RTM_GETROUTE
/NLM_F_DUMP
is always multi-packet. That empirically seems to be the case even for tiny routing tables.
Looking at the ip
source is a good idea. Also we need to verify this with FreeBSD.
But as we only make a single request i thought this should be OK to omit.
Agree, going that far in checking seems unnecessary. i don't think we need super defensive coding as netlink is a local protocol with the kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that infradead page is very handy!
I made a few changes in 6c694c2 based on your thoughts (and the protocol page):
- set socket to non-blocking mode to avoid hanging if the kernel doesn't send a response
- use a vector to collect all data from multi-part responses
- exit when
recv()
returns 0 (this should handle single-part messages, AFAICT)
I think relying on receiving no more data from recv()
to break the receive loop should be as (or perhaps more) robust than checking for the NLM_F_MULTI
flag and exiting after first receive if it's not set, but curious what you think here?
If it would help, I'd be happy to break this into a few smaller commits, as I'm kinda feeling this change contains a few different changes in one in some ways now...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think relying on receiving no more data from recv() to break the receive loop should be as (or perhaps more) robust than checking for the NLM_F_MULTI flag and exiting after first receive if it's not set, but curious what you think here?
Maybe-is it safe to assume that netlink will never block?
We don't want to end up in the same situation as before where we miss data. But due to say, a threading race condition.
i think the safest thing here is to mimic as closely as possible ip
's behavior, as it is the only tool these kind of interfaces tend to be written towards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use a vector to collect all data from multi-part responses
i'm not sure i see the motivation here. Parsing the packets as seperate units is just as valid ("Multipart messages unlike fragmented ip packets must not be reassmbled"), avoids dynamic allocation, and is simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checking of the sequence number on each message to ensure it was meant for our request
This seems like a good idea to at least do in debug builds.
It seems like a good precaution to check for the presence of NLM_F_MULTI
and don't wait for NLMSG_DONE
if it isn't. At least from my naive reading of https://man7.org/linux/man-pages/man7/netlink.7.html it seems NLMSG_DONE
is only used for multipart messages.
Splitting into multiple commits would be useful, e.g. one commit that switches to non-blocking mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it guaranteed that the reponse to
NLM_F_DUMP
is always multipart? Or do we need to checknlmsg_flags
forNLM_F_MULTI
, and if not, break after the first packet?
It seems that in the linux kernel, that is the case. A message type's (e.g. GETROUTE) message handling callback struct has a few callback functions, the .dumpit
callback is called when you send a request with NLM_F_DUMP
. Checking the .dumpit
callbacks, the ones I find set NLM_F_MULTI
in their responses, e.g. inet_dump_ifaddr
which is a wrapper for inet_dump_addr
is the is the .dumpit
for RTM_GETADDR
for IPv4 addresses,, and inet_dump_addr
sets NLM_F_MULTI
, and the same is true for the NLMSG_DONE
that comes when the dump is complete:
static int netlink_dump_done(struct netlink_sock *nlk, struct sk_buff *skb,
struct netlink_callback *cb,
struct netlink_ext_ack *extack)
{
nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(nlk->dump_done_errno),
NLM_F_MULTI | cb->answer_flags);
Grepping for NLMSG_DONE
I think that netlink_dump_done
is the only place NLMSG_DONE
is used that's relevant to us, all of the other instances are in other netlink subsystems, so we only get NLMSG_DONE
in reply to NLM_F_DUMP
, but I think this is the case even when we don't get a multipart message, since netlink_dump_done
is always invoked when sending a request with NLM_F_DUMP
I believe.
c5211f3
to
6c694c2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the code this PR touches isn't compiled on macOS, I did briefly check that things still work there. I also briefly tested on Ubuntu 24.10.
Left some inline question to wrap my head around the changes and refresh my memory of the original...
int64_t recv_result; | ||
do { | ||
recv_result = sock->Recv(temp, sizeof(temp), 0); | ||
} while (recv_result < 0 && (errno == EINTR || errno == EAGAIN)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're touching this line... According to the internet, we should also check EWOULDBLOCK
even though it's usually the same as EAGAIN
, and it's likely not relevant for any system we support.
https://stackoverflow.com/a/49421517
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be helpful:
Lines 22 to 25 in cdc3299
static inline bool IOErrorIsPermanent(int err) | |
{ | |
return err != WSAEAGAIN && err != WSAEINTR && err != WSAEWOULDBLOCK && err != WSAEINPROGRESS; | |
} |
src/common/netif.cpp
Outdated
char temp[4096]; | ||
int64_t recv_result; | ||
do { | ||
recv_result = sock->Recv(temp, sizeof(temp), 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is existing code, but I don't recall why there's no timeout here. And also, should there be a quick wait between Recv
calls?
std::memcpy(&gw, RTA_DATA(rta_gateway), sizeof(gw)); | ||
return CNetAddr(gw, scope_id); | ||
for (nlmsghdr* hdr = (nlmsghdr*)response; NLMSG_OK(hdr, recv_result); hdr = NLMSG_NEXT(hdr, recv_result)) { | ||
if (hdr->nlmsg_type == NLMSG_DONE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checking of the sequence number on each message to ensure it was meant for our request
This seems like a good idea to at least do in debug builds.
It seems like a good precaution to check for the presence of NLM_F_MULTI
and don't wait for NLMSG_DONE
if it isn't. At least from my naive reading of https://man7.org/linux/man-pages/man7/netlink.7.html it seems NLMSG_DONE
is only used for multipart messages.
Splitting into multiple commits would be useful, e.g. one commit that switches to non-blocking mode.
6c694c2
to
92343a6
Compare
Filter netlink responses to only consider default routes by checking the destination prefix length (rtm_dst_len == 0). Previously, we selected the first route with an RTA_GATEWAY attribute, which for IPv6 often resulted in choosing a non-default route instead of the actual default. This caused occasional PCP port mapping failures because a gateway for a non-default route was selected.
This shouldn't usually be hit, but is a good belt-and-braces.
d878853
to
4c53178
Compare
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
After (additional) further investigation I gained some new insights:
Therefore I have re-worked and split the remaining changes and into 3 commits:
I think these are all worthwhile, but I could seean argument that what we currently have works "well enough" for the most basic use-case; a simple (single) routing table. And we could just consider anything else out-of-scope in this project. @Sjors: Ref your comments, the socket is set as Lines 536 to 540 in 74d9598
|
@laanwj you might be interested in circling back here to review? |
Yes, will look into it.
Removed the backport label as this was confirmed not to be a regression (thanks!). |
Put this on the |
ACK 4c53178 Verified against iproute2 and strace |
src/common/netif.cpp
Outdated
rtmsg* r = (rtmsg*)NLMSG_DATA(hdr); | ||
int remaining_len = RTM_PAYLOAD(hdr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These could be moved after if (hdr->nlmsg_type == NLMSG_DONE) {
to avoid the integer sanitizer warning in #33245
// Only consider default routes (destination prefix length of 0). | ||
if (r->rtm_dst_len != 0) { | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 57ce645 (net: filter for default routes in netlink responses)
Just info for other reviewers, I found this useful since I'm generally unfamiliar with how routing and gateways work:
As I understand, the destination prefix is a mask indicating what routes the gateway can be used for, a destination prefix of 0
(alternatively 0.0.0.0
or ::0
) indicates that the gateway can be used for all destinations, and RFC 1812 offers this right above the beginning of section 5.2.4.4:
(5) Default Route: This is a route to all networks for which there are no explicit routes. It is by definition the route whose prefix length is zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relevant (editorialized) code from iproute2:
if (tb[RTA_DST]) {
// [...if the route has a specific destination, e.g.] ip route add 192.168.0.5 via 192.168.0.4
} else if (r->rtm_dst_len) {
// [if the route has prefix has a length]
snprintf(b1, sizeof(b1), "0/%d ", r->rtm_dst_len);
} else {
// [if rtm_dst_len == 0]
strncpy(b1, "default", sizeof(b1));
}
@@ -97,6 +97,10 @@ std::optional<CNetAddr> QueryDefaultGatewayImpl(sa_family_t family) | |||
rtmsg* r = (rtmsg*)NLMSG_DATA(hdr); | |||
int remaining_len = RTM_PAYLOAD(hdr); | |||
|
|||
if (hdr->nlmsg_type != RTM_NEWROUTE) { | |||
continue; // Skip non-route messages | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 42e99ad (net: skip non-route netlink responses)
The man 7 rtnetlink
page (https://man7.org/linux/man-pages/man7/rtnetlink.7.html) does not really offer much insight here, but as far as I can tell, the convention comes from the fact that the rtnetlink interface is also used for monitoring changes to the kernel's routing table, and because the userspace application is trying to maintain an idea of the kernel routing table's state, the interfaces are symmetrical, a userspace application sends an RTM_NEWROUTE
when it wants to insert a new route, and likewise whenever the kernel inserts a new route, it sends out an RTM_NEWROUTE
message over open NETLINK_ROUTE
sockets, so the reply to an RTM_GETROUTE
follows this convention and replies with RTM_NEWROUTE
, and I imagine the same is true for GETADDR
and GETLINK
Looking through the linux kernel's implementation for evidence of this:
e.g. inet_rtm_getroute
is the kernel's callback that handles RTM_GETROUTE
messages for IPv4 routes, and here is a simplified + annotated view of the relevant section of that function:
static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
struct netlink_ext_ack *extack)
{
// [....]
// [ This is never true for us. ]
if (rtm->rtm_flags & RTM_F_FIB_MATCH) {
// [ fib_dump_info populates sk_buff* skb, which holds packet data,
// and places the `event` arg in the nlmsg hdr->nlmsg_type
// https://github.com/torvalds/linux/blob/39f90c1967215375f7d87b81d14b0f3ed6b40c29/net/ipv4/fib_semantics.c#L1732
// ]
fib_dump_info(skb, NETLINK_CB(in_skb).portid,
nlh->nlmsg_seq,/*event=*/RTM_NEWROUTE, &fri, 0);
} else {
// [ This case is relevant for us, rt_fill_info always creates an RTM_NEWROUTE message.]
// https://github.com/torvalds/linux/blob/39f90c1967215375f7d87b81d14b0f3ed6b40c29/net/ipv4/route.c#L2951
rt_fill_info(net, dst, src, rt, table_id, res.dscp, &fl4,
skb, NETLINK_CB(in_skb).portid,
nlh->nlmsg_seq, 0);
}
// [ Broadcast the `skb` on the socket. ]
rtnl_unicast(skb, net, NETLINK_CB(in_skb).portid);
}
// [ https://github.com/torvalds/linux/blob/39f90c1967215375f7d87b81d14b0f3ed6b40c29/net/ipv4/route.c#L2939-L2951 ]
static int rt_fill_info(
// [...]
)
{
nlh = nlmsg_put(skb, portid, seq, RTM_NEWROUTE, sizeof(*r), flags);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relevant section of iproute2, if a response to getroute is not of type RTM_NEWROUTE
, it's an error:
static int iproute_get(int argc, char **argv)
// [... `Populate answer` ]
struct rtmsg *r = NLMSG_DATA(answer);
if (answer->nlmsg_type != RTM_NEWROUTE) {
fprintf(stderr, "Not a route?\n");
delete_json_obj();
free(answer);
return -1;
}
}
for (rtattr* attr = RTM_RTA(r); RTA_OK(attr, remaining_len); attr = RTA_NEXT(attr, remaining_len)) { | ||
if (attr->rta_type == RTA_GATEWAY) { | ||
rta_gateway = attr; | ||
} else if (attr->rta_type == RTA_OIF && sizeof(int) == RTA_PAYLOAD(attr)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For context on this, IPv6 allows "scoped addresses", e.g. I can send a packet to a non-globally unique address, like a link-local address that has two destinations on two different links, so in order to disambiguate scoped addresses RFC4007#section-6 recommends the use of unique zone indexes:
Because the same non-global address may be in use in more than one zone of the same scope (e.g., the use of link-local address fe80::1 in two separate physical links) and a node may have interfaces attached to different zones of the same scope (e.g., a router normally has multiple interfaces attached to different links), a node requires an internal means to identify to which zone a non-global address belongs. This is accomplished by assigning, within the node, a distinct "zone index" to each zone of the same scope to which that node is attached, and by allowing all internal uses of an address to be qualified by a zone index.
And RFC4007#section-7 suggests that specifying an output interface index as the zone index is sufficient, but more specific than necessary to disambiguate:
When an upper-layer protocol sends a packet to a non-global destination address, it must have a means of identifying the intended zone to the IPv6 layer for cases in which the node is attached to more than one zone of the destination address's scope.
Although identification of an outgoing interface is sufficient to identify an intended zone (because each interface is attached to no more than one zone of each scope), in many cases that is more specific than desired.
The name used in the rtnetlink interface (RTA_OIF
) suggest that the unique scope ID / zone ID used is the output interface index (OIF), but I guess that's an implementation detail, the point is, it's a unique index that makes an IPv6 address globally unique from the POV of a given device.
src/common/netif.cpp
Outdated
processed_one = true; | ||
if (hdr->nlmsg_flags & NLM_F_MULTI) { | ||
multi_part = true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 4c53178 (net: handle multi-part netlink responses)
nit:
processed_one
, multi_part
and the check at the bottom of the outer for loop could be dropped, with the following check:
processed_one = true; | |
if (hdr->nlmsg_flags & NLM_F_MULTI) { | |
multi_part = true; | |
} | |
if (!(hdr->nlmsg_flags & NLM_F_MULTI)) { | |
done = true; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A slightly different approach, which avoids confusing NLMSG_DONE
with done
:
diff --git a/src/common/netif.cpp b/src/common/netif.cpp
index 77246980c3..3f3b8fdcea 100644
--- a/src/common/netif.cpp
+++ b/src/common/netif.cpp
@@ -117,9 +117,10 @@ std::optional<CNetAddr> QueryDefaultGatewayImpl(sa_family_t family)
// Receive response.
char response[4096];
ssize_t total_bytes_read{0};
- bool done{false};
- bool multi_part{false};
- while (!done) {
+ // If we processed at least one message and multi flag not set, or if
+ // we received no valid messages, then we're done.
+ bool maybe_more{true};
+ while (maybe_more) {
int64_t recv_result;
do {
recv_result = sock->Recv(response, sizeof(response), 0);
@@ -135,18 +136,16 @@ std::optional<CNetAddr> QueryDefaultGatewayImpl(sa_family_t family)
return std::nullopt;
}
- bool processed_one{false};
for (nlmsghdr* hdr = (nlmsghdr*)response; NLMSG_OK(hdr, recv_result); hdr = NLMSG_NEXT(hdr, recv_result)) {
rtmsg* r = (rtmsg*)NLMSG_DATA(hdr);
int remaining_len = RTM_PAYLOAD(hdr);
- processed_one = true;
- if (hdr->nlmsg_flags & NLM_F_MULTI) {
- multi_part = true;
- }
+ if (!(hdr->nlmsg_flags & NLM_F_MULTI)) {
+ maybe_more = false;
+ };
if (hdr->nlmsg_type == NLMSG_DONE) {
- done = true;
+ maybe_more = false;
break;
}
@@ -183,12 +182,6 @@ std::optional<CNetAddr> QueryDefaultGatewayImpl(sa_family_t family)
}
}
}
-
- // If we processed at least one message and multi flag is not set,
- // or if we received no valid messages, then we're done.
- if ((processed_one && !multi_part) || !processed_one) {
- done = true;
- }
}
return std::nullopt;
untested crACK 4c53178 57ce645 corrects a bug where 42e99ad adds a belt and suspenders check that we have received a sensible reply to our 4c53178 correctly handles multi-part messages, I believe the current implementation would fail e.g. any time the reply to our I left a few non-blocking nits, and comments that provide some context that I had to look for when reviewing the changes here. |
utACK 4c53178 I didn't specifically test multipart messages. I did test this (rebased) PR with a router running OPNsense 25.7.2 with miniupnpd 2.3.9 using pf backend. From a macOS 15.6.1 machine and macOS 13.7.8 machine I can see it opens a port and I'm (still) getting inbound connections. This is unsurprising, because the changes here don't impact macOS. I also tested on Ubuntu 25.04 |
Tested but did not dig deep into the code for now. Spent more than a day adding debug code in Bitcoin Core and fighting with my OpenWrt router (openwrt/packages#17871 (comment)). Now that the router has been fully reset without any lingering configs from prior major versions, PCP & NATPMP is actually working. On master (in 7/7 attempts roughly):
On the router side:
Taking the first commit (57ce645) from this PR makes the output towards the end better (my ISP doesn't support IPv6):
And there is no longer any error in the router log. So PCP+IPv4 mostly works regardless of the PR, but at least the failure path for IPv6 is improved by it. Haven't really noticed any changes in behavior with my setup from the latter 2 commits. Might focus on other things for now as this already has 3 ACKs. |
You would not be the first person running into router bugs while testing NAT punching functionality :-) |
Handle multi-part netlink responses to prevent truncated results from large routing tables. Previously, we only made a single recv call, which led to incomplete results when the kernel split the message into multiple responses (which happens frequently with NLM_F_DUMP). Also guard against a potential hanging issue where the code would indefinitely wait for NLMSG_DONE for non-multi-part responses by detecting the NLM_F_MULTI flag and only continue waiting when necessary.
4c53178
to
88db09b
Compare
ACK 88db09b |
Code Review re-ACK 88db09b $ git range-diff 4c53178...88db09b 88db09b takes the reviewer suggestion of moving the |
re-utACK 88db09b |
Do we know if this change might fix the integer overflow error reported #32345 (comment)? |
I moved variables as per #32159 (comment) to try and fix this in here. I did not verify whether it worked, however. |
...for default route in pcp pinholing.
Currently we only make a single recv call, which trucates results from large routing tables, or in the case the kernel may split the message into multiple responses (which may happen with
NLM_F_DUMP
).We also do not filter on the default route. For IPv6, this led to selecting the first route with an
RTA_GATEWAY
attribute, often a non-default route instead of the actual default. This caused PCP port mapping failures because the wrong gateway was used.Fix both issues by adding multi-part handling of responses and filter for the default route.
Limit responses to ~ 1MB to prevent any router-based DoS.