-
Notifications
You must be signed in to change notification settings - Fork 3.4k
v1.16 Backports 2024-10-30 #35654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
v1.16 Backports 2024-10-30 #35654
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[ upstream commit 5d1951b ] According to the kernel docs[^1], the kernel can return incomplete results for netlink state dumps if the state changes while we are dumping it. The result is then marked by `NLM_F_DUMP_INTR`. The `vishvananda/netlink` library returned `EINTR` since v1.2.1, but more recent versions have changed it such that it returns `netlink.ErrDumpInterrupted` instead[^2]. These interruptions seem common in high-churn environments. If the error occurs, it is in most cases best to just try again. Therefore, this commit adds a wrapper for all `netlink` functions marked to return `ErrDumpInterrupted` that retries the function up to 30 times until it either succeeds or returns a different error. While may call sites do have their own high-level retry mechanism (see e.g. #32099), the logged error message can still cause CI to fail (e.g. #35259). Long high-level retry intervals can also become problematic: For example, if the routing setup fails due to `NLM_F_DUMP_INTR` during an CNI ADD invocation, the retry adds add several seconds of additional delay to an already overloaded system, instead of resolving the issue quickly. A subsequent commit will add an additional linter that nudges developers to use this new `safenetlink` package for function calls that can be interrupted. This ensures that we don't have to add retries in all subsystems individually. [^1]: https://docs.kernel.org/userspace-api/netlink/intro.html [^2]: vishvananda/netlink#1018 Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit abbe2b8 ] See commit `netlink: Add wrapper for functions that fail with `ErrDumpInterrupted` for details. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 428d2c4 ] See commit `netlink: Add wrapper for functions that fail with `ErrDumpInterrupted` for details. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 7facbaf ] See commit `netlink: Add wrapper for functions that fail with ErrDumpInterrupted` for details. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit dec70b4 ] See commit `netlink: Add wrapper for functions that fail with ErrDumpInterrupted` for details. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit c271659 ] See commit `netlink: Add wrapper for functions that fail with ErrDumpInterrupted` for details. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 91659fd ] See commit `netlink: Add wrapper for functions that fail with ErrDumpInterrupted` for details. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit acf9e04 ] See commit `netlink: Add wrapper for functions that fail with ErrDumpInterrupted` for details. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit e8c4d45 ] See commit `netlink: Add wrapper for functions that fail with ErrDumpInterrupted` for details. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit ef780e2 ] See commit `netlink: Add wrapper for functions that fail with ErrDumpInterrupted` for details. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
michi-covalent
approved these changes
Oct 30, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🌯
[ upstream commit d7668b0 ] See commit `netlink: Add wrapper for functions that fail with ErrDumpInterrupted` for details. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 5d38674 ] See commit `netlink: Add wrapper for functions that fail with ErrDumpInterrupted` for details. In addition, this commit also removes some existing retry mechanisms with the canoncial safenetlink version. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit caf6088 ] See commit `netlink: Add wrapper for functions that fail with ErrDumpInterrupted` for details. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit c0c3244 ] See commit `netlink: Add wrapper for functions that fail with ErrDumpInterrupted` for details. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 04b6de8 ] See commit `netlink: Add wrapper for functions that fail with ErrDumpInterrupted` for details. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit f8cb55d ] See commit `netlink: Add wrapper for functions that fail with ErrDumpInterrupted` for details. This commit is a bit different from the previous ones as it uses the retry helper rather than a wrapper. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 3824b6b ] This adds a new check script to ensure any functions which has a safe wrapper in the `safenetlink` package uses the safe variant. See commit `netlink: Add wrapper for functions that fail with ErrDumpInterrupted` for details. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
0fedbd7
to
0037c95
Compare
/test-backport-1.16 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
backport/1.16
This PR represents a backport for Cilium 1.16.x of a PR that was merged to main.
kind/backports
This PR provides functionality previously merged into master.
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
netlink
functions that may fail withErrDumpInterrupted
#35614 (@gandro)Once this PR is merged, a GitHub action will update the labels of these PRs: