-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Miscellaneous improvements and fixes concerning the endpoints UID checks and surrounding logic #36392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The function always returns a nil error, hence let's just drop it. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
The blamed commit introduced a fallback logic to retrieve the pod corresponding to a given endpoint directly from k8s if the store is out of date, and contains an entry with a mismatching UID. However, the metadata is not recomputed based on the newly retrieved pod, hence potentially configuring incorrect values. Let's get this fixed. Fixes: f6606c6 ("daemon,endpoint,cni: Pass pod UID through CNI ADD") Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
72c6530
to
eed421a
Compare
squeed
reviewed
Dec 5, 2024
eed421a
to
0762cc6
Compare
giorio94
commented
Dec 5, 2024
/test |
0762cc6
to
e0b34de
Compare
Introduce a new unit test to cover the handleOutdatedPodInformer function, in preparation for the subsequent changes. In particular, this proves two problems currently affecting it: 1. The function does not propagate the error returned by the fetch function if the endpoint UID is not set. 2. The function does not perform any retry in case of pod UID mismatch, conversely to the intended behavior. --- FAIL: TestHandleOutdatedPodInformer (0.00s) --- FAIL: TestHandleOutdatedPodInformer/pod_not_found_(epUID:_) (0.00s) endpoint_test.go:278: Error Trace: .../daemon/cmd/endpoint_test.go:278 Error: Not equal: expected: *errors.StatusError(&errors.StatusError{... actual : <nil>(<nil>) Test: TestHandleOutdatedPodInformer/pod_not_found_(epUID:_) --- FAIL: TestHandleOutdatedPodInformer/uid_mismatch_(epUID:_uid) (0.00s) endpoint_test.go:288: Error Trace: .../daemon/cmd/endpoint_test.go:288 Error: Not equal: expected: 0x14 actual : 0x1 Test: TestHandleOutdatedPodInformer/uid_mismatch_(epUID:_uid) Messages: Incorrect number of retries --- FAIL: TestHandleOutdatedPodInformer/uid_mismatch,_then_resolved_(epUID:_uid) (0.00s) endpoint_test.go:278: Error Trace: .../daemon/cmd/endpoint_test.go:278 Error: Not equal: expected: <nil>(<nil>) actual : *errors.errorString(&errors.errorString{s:"pod store outdated"}) Test: TestHandleOutdatedPodInformer/uid_mismatch,_then_resolved_(epUID:_uid) endpoint_test.go:288: Error Trace: .../daemon/cmd/endpoint_test.go:288 Error: Not equal: expected: 0x6 actual : 0x1 Test: TestHandleOutdatedPodInformer/uid_mismatch,_then_resolved_(epUID:_uid) Messages: Incorrect number of retries FAIL FAIL github.com/cilium/cilium/daemon/cmd 1.336s FAIL Related: f6606c6 ("daemon,endpoint,cni: Pass pod UID through CNI ADD") Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
The handleOutdatedPodInformer function currently does not propagate fetch errors if the endpoint UID is not set. Although errors can only occur if either the namespace or the pod are not present yet in the store, let's get this fixed, to prevent possibly accessing uninitialized variables. Fixes: f6606c6 ("daemon,endpoint,cni: Pass pod UID through CNI ADD") Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
According to the description of the blamed commit: If the UID is outdated, this means that Cilium has not yet received the updated pod event that corresponds with the CNI ADD. In this case, Cilium will attempt to query the pod store up for to 2 seconds. If the pod store does not have the altest pod reference after 2 seconds, Cilium will trigger a direct fetch from the apiserver [...] However, the current implementation never performs any retry, because the underlying retry logic returns immediately in case an error is returned by the function. Let's fix this by not returning an error in that case, as well as changing the duration period to actually match the expected max retry time of 2 seconds. Fixes: f6606c6 ("daemon,endpoint,cni: Pass pod UID through CNI ADD") Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Currently, the update of no track rules and bandwidth policies is handled posting an event to the event queue, which in turn executes a callback to retrieve the corresponding pod from the store, and in turn the annotation itself. Let's simplify this by directly propagating the annotation(s) in the event itself, given that all callers have access to pod object, and subsequent annotation updates would just trigger a new event. This also prevents a possible incorrect ending state in case the same annotation is flipped multiple times. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Move the UID mismatch check directly to the function that retrieves the metadata for a given endpoint, and let it return an error in that case. This change ensures that the UID is always properly checked, and allows simplifying the error handling processing in the callers. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
All callers either ignore this field or have direct access to the full pod object. Hence, let's simply remove it, to prevent possible ambiguity. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
e0b34de
to
bc5e3e3
Compare
/test |
squeed
approved these changes
Dec 6, 2024
christarazi
approved these changes
Dec 6, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks for simplifying and fixing this code
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
affects/v1.14
This issue affects v1.14 branch
affects/v1.15
This issue affects v1.15 branch
affects/v1.16
This issue affects v1.16 branch
area/agent
Cilium agent related.
area/daemon
Impacts operation of the Cilium daemon.
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
release-note/misc
This PR makes changes that have no direct user impact.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Please review commit by commit, and refer to the individual descriptions for additional details.
I'm currently a bit on the fence whether any commit would need to be backported to earlier branches, although I'm leaning more towards not backporting them given that they address very specific corner case situations. The only issue which looks a bit more problematic to me is the one fixed in 6bebde0, because it could lead to accessing a nil variable, but it is my understanding that the UID is always guaranteed to be set. Anyways, please let me know your thoughts during the review.
Related: #36292 (comment)
/cc @squeed