fix: check nlmsghdr flags for interrupt #925

adrianchiris · 2023-11-12T14:21:47Z

fail if NLM_F_DUMP_INTR flag is set

fail if NLM_F_DUMP_INTR flag is set Signed-off-by: adrianc <adrianc@nvidia.com>

adrianchiris · 2023-11-12T14:22:23Z

nl/nl_linux.go

-//      < code which needs to be executed in specific netns>
-//  }
+//
+//	func jobAt(...) error {


while unrelated, linter keeps fixing this because of style issues.

adrianchiris · 2023-11-12T14:23:23Z

this check is similar to what is perfomed in libmnl[1]

[1] https://github.com/justmirror/libmnl/blob/f14732339a77a2e6ba9ed4ca99b347ce1dc60801/src/callback.c#L70

aboch · 2023-11-12T20:06:00Z

LGTM

Hunch is that the change in vishvananda/netlink#1018, will prevent EBUSY by draining the remaining messages from the kernel before returning. Before vishvananda/netlink#925, all messages would be read but the "interrupted" flag was ignored. That PR made it so that the dump would return early, but that would leave some messages unconsumed. There was logic elsewhere to discard these messages, but the kernel has a check for "dump in progress" at the start of a new dump and it returns EBUSY in that case: ``` int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb, const struct nlmsghdr *nlh, struct netlink_dump_control *control) { ... mutex_lock(&nlk->nl_cb_mutex); /* A dump is in progress... */ if (nlk->cb_running) { ret = -EBUSY; goto error_unlock; } ``` I wasn't able to trace the exact path through the kernel code to verify when cb_running would be set, or that the error would be returned on the NLMSG_DONE as we see here but it seemed like a solid bet.

…andanda/netlink ..that may return EINTR. Fixes: ovn-kubernetes#5358 A change to the vishvananda/netlink package (vishvananda/netlink#925 in v1.2.1) exposes NLM_F_DUMP_INTR in some netlink responses as an EINTR error return but also doesn't return any data therefore all we can do is retry and netlink safe provides this with wrappers to certain netlink fn. This has been causing failures in some unit tests that interact with netlink. Retry the requests on EINTR, up to five times, Also, add RuleList to the the API and switch IP rule manager to use this. The reason it wasnt done before is ip rule manager doesn't fake out netlink for testing. Signed-off-by: Martin Kennelly <mkennell@redhat.com>

fix: check nlmsghdr flags for interrupt

afdfb73

fail if NLM_F_DUMP_INTR flag is set Signed-off-by: adrianc <adrianc@nvidia.com>

adrianchiris commented Nov 12, 2023

View reviewed changes

aboch merged commit aa4f20d into vishvananda:main Nov 12, 2023

This was referenced Aug 30, 2024

Fix recvfrom goroutine leak #793

Merged

networking: investigate EINTR regression after updating github.com/vishvananda/netlink to v1.3.0 moby/moby#48400

Closed

This was referenced Sep 9, 2024

Preserve results when NLM_F_DUMP_INTR is set #1018

Merged

Retry on EINTR from netlink dump calls moby/moby#48407

Merged

jrajahalme mentioned this pull request Oct 13, 2024

datapath: Retry netlink.LinkList when interrupted cilium/cilium#35259

Merged

aojea mentioned this pull request Jan 10, 2025

vishvananda Netlink breaking changes in 1.2.1 kubernetes/kubernetes#129562

Closed

This was referenced Jan 29, 2025

Fix EBUSY errors from netlink operations. projectcalico/calico#9769

Merged

LinkList and similar returning "device or resource busy" (EBUSY) #1057

Open

martinkennelly mentioned this pull request Jul 19, 2025

Retry selected netlink ops because netlink notified us that config changed during operation and results unreliable ovn-kubernetes/ovn-kubernetes#5402

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: check nlmsghdr flags for interrupt #925

fix: check nlmsghdr flags for interrupt #925

Uh oh!

adrianchiris commented Nov 12, 2023

Uh oh!

adrianchiris Nov 12, 2023

Uh oh!

adrianchiris commented Nov 12, 2023

Uh oh!

aboch commented Nov 12, 2023

Uh oh!

Uh oh!

fix: check nlmsghdr flags for interrupt #925

fix: check nlmsghdr flags for interrupt #925

Uh oh!

Conversation

adrianchiris commented Nov 12, 2023

Uh oh!

adrianchiris Nov 12, 2023

Choose a reason for hiding this comment

Uh oh!

adrianchiris commented Nov 12, 2023

Uh oh!

aboch commented Nov 12, 2023

Uh oh!

Uh oh!