-
Notifications
You must be signed in to change notification settings - Fork 3.4k
watchers: disable CRD to kvstore handover logic #35741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/test |
63144ba
to
850d953
Compare
/test |
850d953
to
9a8a414
Compare
/test |
CI is green, except for an unrelated flake. Dropping the test commits from #35679, and marking ready for review. |
When running in kvstore mode, Cilium currently still starts the CiliumNode and CiliumEndpoint informers, stopping them once connecting to etcd. Informers are then potentially restarted if the client disconnects from etcd again. This handover logic, which has been historically introduced to support running etcd in pod network, is known to be complex and fragile (as it can potentially lead to stale entries), and introduces extra load on the Kubernetes API Server during Cilium agent startup. Considering that this handover logic is not needed when etcd runs outside of the cluster (or in host network) and support for running etcd in pod network has been officially deprecated in v1.16 [1], let's disable it altogether. Still, let's keep it behind a hidden flag for the moment, to allow easily enabling it again in case of problems. If nothing unexpected comes up, we can then definitely remove it in the future. The same flag is used in the context of the pod watcher as well, to avoid switching to watching all the pods in the cluster (rather than only the ones hosted by the current node) when the CiliumEndpoint CRD is disabled. Indeed, this was again needed to support running etcd in pod network, but completely defeats the performance benefits associated with disabling the CiliumEndpoint CRD. [1]: f99f10b ("docs: Deprecate support for podnetwork etcd") Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
When Cilium is configured in KVstore mode, and support for running the kvstore in pod network is disabled, we don't start the CiliumEndpoints informer at all. Hence, let's disable this GC logic as well, given that it would otherwise need to start it to populate the store content. Indeed, no one is expected to be watching them, and we can accept the possibility that we leak a few objects in very specific and rare circumstances [1], until the corresponding pod gets deleted. The respective kvstore entries, which are not taken into account here, will be instead eventually deleted when the corresponding lease expires. Even better, we could disable the CiliumEndpoint CRD altogether in this configuration, given that CiliumEndpoints are not expected to be used at all. However, that comes with extra risks, especially during the initial migration phase, when old agents may still watch them in case of restarts, and given that the feature is not really battle tested. Hence, let's go for this simpler, and safer, approach for the moment. [1]: #20350 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
9a8a414
to
5f127c4
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even better, we could disable the CiliumEndpoint CRD altogether in this configuration, given that CiliumEndpoints are not expected to be used at all. However, that comes with extra risks, especially during the initial migration phase, when old agents may still watch them in case of restarts, and given that the feature is not really battle tested. Hence, let's go for this simpler, and safer, approach for the moment.
I would say it makes sense from the scalability point of view to disable it by default, but from the user point of view, CEP CRD is still nice to check endpoint and identity association without needing to check Etcd.
Hmm, I see your point. However, in that configuration we already lack the information about the identities (which are stored only inside etcd), so it shouldn't make a lot of difference. Especially because if there's any problem etcd would be the only source of truth, while CEPs would be informative only. That said, it would definitely be nice to have additionally tooling to simplify the inspection of the etcd state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@giorio94 Nice work Marco!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack for the doc change
@nathanjsweet Gentle ping 🙏 |
When running in kvstore mode, Cilium currently still starts the CiliumNode and CiliumEndpoint informers, stopping them once connecting to etcd. Informers are then potentially restarted if the client disconnects from etcd again. This handover logic, which has been historically introduced to support running etcd in pod network, is known to be complex and fragile (as it can potentially lead to stale entries), and introduces extra load on the Kubernetes API Server during Cilium agent startup.
Considering that this handover logic is not needed when etcd runs outside of the cluster (or in host network) and support for running etcd in pod network has been officially deprecated in v1.16 [1], let's disable it altogether. Still, let's keep it behind a hidden flag for the moment, to allow easily enabling it again in case of problems. If nothing unexpected comes up, we can then definitely remove it in the future.
The same flag is used in the context of the pod watcher as well, to avoid switching to watching all the pods in the cluster (rather than only the ones hosted by the current node) when the CiliumEndpoint CRD is disabled. Indeed, this was again needed to support running etcd in pod network, but completely defeats the performance benefits associated with disabling the CiliumEndpoint CRD.
[1]: f99f10b ("docs: Deprecate support for podnetwork etcd")
When Cilium is configured in KVstore mode, and support for running the kvstore in pod network is disabled, we don't start the CiliumEndpoints informer at all. Hence, let's disable this GC logic as well, given that it would otherwise need to start it to populate the store content. Indeed, no one is expected to be watching them, and we can accept the possibility that we leak a few objects in very specific and rare circumstances [1], until the corresponding pod gets deleted. The respective kvstore entries, which are not taken into account here, will be instead eventually deleted when the corresponding lease expires.
Even better, we could disable the CiliumEndpoint CRD altogether in this configuration, given that CiliumEndpoints are not expected to be used at all. However, that comes with extra risks, especially during the initial migration phase, when old agents may still watch them in case of restarts, and given that the feature is not really battle tested. Hence, let's go for this simpler, and safer, approach for the moment.
[1]: #20350