Skip to content

Conversation

adrianchiris
Copy link
Collaborator

fail if NLM_F_DUMP_INTR flag is set

fail if NLM_F_DUMP_INTR flag is set

Signed-off-by: adrianc <adrianc@nvidia.com>
// < code which needs to be executed in specific netns>
// }
//
// func jobAt(...) error {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while unrelated, linter keeps fixing this because of style issues.

@adrianchiris
Copy link
Collaborator Author

this check is similar to what is perfomed in libmnl[1]

[1] https://github.com/justmirror/libmnl/blob/f14732339a77a2e6ba9ed4ca99b347ce1dc60801/src/callback.c#L70

@aboch
Copy link
Collaborator

aboch commented Nov 12, 2023

LGTM

@aboch aboch merged commit aa4f20d into vishvananda:main Nov 12, 2023
fasaxc added a commit to fasaxc/calico that referenced this pull request Jan 29, 2025
Hunch is that the change in vishvananda/netlink#1018,
will prevent EBUSY by draining the remaining messages from the kernel
before returning.

Before vishvananda/netlink#925, all messages
would be read but the "interrupted" flag was ignored.  That PR made it
so that the dump would return early, but that would leave some messages
unconsumed.  There was logic elsewhere to discard these messages,
but the kernel has a check for "dump in progress" at the start of a new
dump and it returns EBUSY in that case:

```
int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
			 const struct nlmsghdr *nlh,
			 struct netlink_dump_control *control)
{
...
	mutex_lock(&nlk->nl_cb_mutex);
	/* A dump is in progress... */
	if (nlk->cb_running) {
		ret = -EBUSY;
		goto error_unlock;
	}
```

I wasn't able to trace the exact path through the kernel code to verify
when cb_running would be set, or that the error would be returned
on the NLMSG_DONE as we see here but it seemed like a solid bet.
martinkennelly added a commit to martinkennelly/ovn-kubernetes-1 that referenced this pull request Jul 19, 2025
…andanda/netlink

..that may return EINTR.

Fixes: ovn-kubernetes#5358

A change to the vishvananda/netlink package (vishvananda/netlink#925 in v1.2.1)
exposes NLM_F_DUMP_INTR in some netlink responses as an EINTR error return but
also doesn't return any data therefore all we can do is retry and netlink
safe provides this with wrappers to certain netlink fn.

This has been causing failures in some unit tests that interact with netlink.

Retry the requests on EINTR, up to five times,

Also, add RuleList to the the API and switch IP rule manager to use this.
The reason it wasnt done before is ip rule manager doesn't fake out
netlink for testing.

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants