-
Notifications
You must be signed in to change notification settings - Fork 3.4k
treewide: Add wrapper for netlink
functions that may fail with ErrDumpInterrupted
#35614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e4f6449
to
ad479a5
Compare
See commit `netlink: Add wrapper for functions that fail with ErrDumpInterrupted` for details. In addition, this commit also removes some existing retry mechanisms with the canoncial safenetlink version. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
See commit `netlink: Add wrapper for functions that fail with ErrDumpInterrupted` for details. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
See commit `netlink: Add wrapper for functions that fail with ErrDumpInterrupted` for details. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
See commit `netlink: Add wrapper for functions that fail with ErrDumpInterrupted` for details. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
See commit `netlink: Add wrapper for functions that fail with ErrDumpInterrupted` for details. This commit is a bit different from the previous ones as it uses the retry helper rather than a wrapper. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This adds a new check script to ensure any functions which has a safe wrapper in the `safenetlink` package uses the safe variant. See commit `netlink: Add wrapper for functions that fail with ErrDumpInterrupted` for details. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
ceddefd
to
feff2df
Compare
I only found one usage which might be buggy: cilium/pkg/datapath/sockets/sockets.go Lines 281 to 283 in 21c2608
All other users are only importing types, but are not actually invoking netlink requests. I believe that function should check for |
/test |
This ensures that safenetlink package is used across the whole tree, in particular also in unit tests (where EINTR can cause flaky tests) and some invocations via handle which were previously missed (see cilium#35614 for context why safenetlink is required). This prepares the tree for the use of the forbidigo linter via golangci. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This ensures that safenetlink package is used across the whole tree, in particular also in unit tests (where EINTR can cause flaky tests) and some invocations via handle which were previously missed (see cilium#35614 for context why safenetlink is required). This prepares the tree for the use of the forbidigo linter via golangci. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This ensures that safenetlink package is used across the whole tree, in particular also in unit tests (where EINTR can cause flaky tests) and some invocations via handle which were previously missed (see cilium#35614 for context why safenetlink is required). This prepares the tree for the use of the forbidigo linter via golangci. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This ensures that safenetlink package is used across the whole tree, in particular also in unit tests (where EINTR can cause flaky tests) and some invocations via handle which were previously missed (see cilium#35614 for context why safenetlink is required). This prepares the tree for the use of the forbidigo linter via golangci. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This ensures that safenetlink package is used across the whole tree, in particular also in unit tests (where EINTR can cause flaky tests) and some invocations via handle which were previously missed (see #35614 for context why safenetlink is required). This prepares the tree for the use of the forbidigo linter via golangci. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This ensures that safenetlink package is used across the whole tree, in particular also in unit tests (where EINTR can cause flaky tests) and some invocations via handle which were previously missed (see cilium#35614 for context why safenetlink is required). This prepares the tree for the use of the forbidigo linter via golangci. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
According to the kernel docs1, the kernel can return incomplete results for netlink state dumps if the state changes while we are dumping it. The result is then marked by
NLM_F_DUMP_INTR
. Thevishvananda/netlink
library returnedEINTR
since v1.2.1, but more recent versions have changed it such that it returnsnetlink.ErrDumpInterrupted
instead2.These interruptions seem common in high-churn environments. If the error occurs, it is in most cases best to just try again. Therefore, this commit adds a wrapper for all
netlink
functions marked to returnErrDumpInterrupted
that retries the function up to 30 times until it either succeeds or returns a different error.While may call sites do have their own high-level retry mechanism (see e.g. #32099), the logged error message can still cause CI to fail (e.g. #35259). Long high-level retry intervals can also become problematic: For example, if the routing setup fails due to
NLM_F_DUMP_INTR
during an CNI ADD invocation, the retry adds add several seconds of additional delay to an already overloaded system, instead of resolving the issue quickly.The last commit adds an additional linter that nudges developers to use this new
safenetlink
package for function calls that can be interrupted. This ensures that we don't have to add retries in all subsystems individually.Review per commit
Footnotes
https://docs.kernel.org/userspace-api/netlink/intro.html ↩
https://github.com/vishvananda/netlink/pull/1018 ↩