-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Tidy up Kubernetes watcher synchronization #17677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tidy up Kubernetes watcher synchronization #17677
Conversation
Refactor this logic into a dedicated function to simplify later commits. No functional changes. Signed-off-by: Joe Stringer <joe@cilium.io>
Previously, the list of resources to be initialized would be prepared in k.EnableK8sWatcher(), but the logic to handle the wait for cache sync could select a different set of resources. This PR tidies these up into smaller functions divided by the functionality they provide (core CNI or feature additions), and consistently enables these using the pkg/option configurations. In future we can potentially even remove unused CRDs from the cluster and Cilium will not complain that they are unavailable; this commit doesn't try to go quite that far yet. Signed-off-by: Joe Stringer <joe@cilium.io>
Shift the declaration of the CRD resources used by the agent & other applications into pkg/k8s/synced, then declare a mapping from that canonical list of resources to the corresponding groups inside pkg/k8s/watchers. This makes both pieces of code synchronized. If for instance CiliumEndpoint CRDs are disabled, they are globally disabled everywhere rather than just in part of the agent code. Signed-off-by: Joe Stringer <joe@cilium.io>
Some resources may be dynamically disabled by configuration options in the agent / operator. It doesn't make sense to register these resources into Kubernetes if they are disabled in the agent, so don't register them in that case. Signed-off-by: Joe Stringer <joe@cilium.io>
ci-multicluster |
ci-external-workloads |
/test |
The two testsuites that were broken by the previous implementation are now passing:
The problem was that this PR makes the CRD registration conditional for optional features like LRP, egress gateway. However I mistakenly set it up so that some of the daemons would wait for all resources to be created on startup, not just the ones that are currently enabled. This led to the following issue in
This is now fixed in the latest version. |
Here's the diff from the previous version I posted, basically the notion of "override" didn't really make sense; all code should just respect the conditional enablement of specific resources.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
// We don't add the service option modifier here, as endpointslices do not | ||
// mirror service proxy name label present in the corresponding service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patch just moves code around, but reading above comment got me wondering: does here
imply that the "service option modifier" is added elsewhere? If yes, should we mention where this happens in the comment? If no, should we remove here
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the original author of the comment @fristonio or some of the other @cilium/sig-k8s folks could chime in here. Original commit that introduced the comment is c6eae50 ("k8s: honor the service.kubernetes.io/service-proxy-name label") , but from a glance the comment doesn't mean much to me.
The ConfermanceEKS job hit #16938 , but otherwise CI is all green. Should be good to merge given that the PR was previously reviewed via #17145, the diff was reviewed, and the tests which had previously regressed are now fixed with the latest diff. |
Cilium should no longer register or wait to sync k8s resources that are disabled such as LRP.
See commits for more details.
Previously merged in #17145 and reverted in #17675 due to breakage of external
workloads and clustermesh testsuites which were marked optional at the time.