-
Notifications
You must be signed in to change notification settings - Fork 3.4k
allocator: Fix kvstore identity leak #34893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/test |
This pull request has been automatically marked as stale because it |
Ahh. I'll rebase this and fix the conflict as well as remove the extra code in #34893 (comment) tomorrow morning. |
9d17929
to
7a2c1f5
Compare
Got stuck with other work, but did the rebase now. Will read through on Monday and then remove the |
7a2c1f5
to
b1dd783
Compare
After running the reproduction script in #35451 with this code, with a The following while the script creating and deleting pods is running; $ kubectl exec -n kube-system daemonset/cilium -- cilium kvstore get --recursive "cilium/state/identities/v1/value/" 2>&1 |egrep "fix-"
cilium/state/identities/v1/value/k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=default;k8s:io.cilium.k8s.policy.cluster=default;k8s:io.cilium.k8s.policy.serviceaccount=default;k8s:io.kubernetes.pod.namespace=default;k8s:run=fix-745;/172.18.0.2 => 1377 After stopping, and all pods are deleted; $ kubectl exec -n kube-system daemonset/cilium -- cilium kvstore get --recursive "cilium/state/identities/v1/value/" 2>&1 |egrep "fix-"
<no output> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for discovering this issue and proposing a fix. Your patch overall looks great to me -- I've just left a few minor comments inline.
b1dd783
to
9b1782b
Compare
9b1782b
to
e5dbba8
Compare
Thanks for looking at this @giorio94! Pushed a new version now. Been running in a loop locally and seems to work as expected! Running locally I see some of these (with a tiny kvstore-sync value);
so it all seems to work as intended; and after all pods are gone, all the "slave keys" are released correctly and we have no leaks. |
e5dbba8
to
16a8184
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks great to me 🚀 Just a couple of nits inline.
/test |
The syncLocalKeys call will with some small probability re-create slave keys for identities that are no longer used on the node. This often happens immidiately after the last reference on the node is cleaned up, and the slave key is deleted during the Release portion of the allocator. This results in the slave key being present until the etcd lease expires - and that usually doesn't happen until the agent restarts. This is not an issue for the master keys - as that is not using a per-node lease, and is cleaned up by the operator at an interval. In large clusters with a lot of identity and pod churn, we see this happening multiple times an hour. It can be investigated by looking at the "Re-created missing slave key" logline. This often leads to multiple thousand extra identities in the clusters, that are unused and not cleaned up. This function will now ensure that as best as it can it won't upsert the key if its no longer in use. And, if its unused after the upsert is done, it grabs the lock to double check - and if its no longer in use, it will release it to ensure its cleared up. Signed-off-by: Odin Ugedal <ougedal@palantir.com> Signed-off-by: Odin Ugedal <odin@uged.al>
Signed-off-by: Odin Ugedal <odin@uged.al> Signed-off-by: Odin Ugedal <ougedal@palantir.com>
16a8184
to
c1ad50a
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
This fixes an identity leak in the kvstore implementation that has been here since the beginning. If the kvstore-refresh sync dumps the loadKeys immediately before a key is released, it will try to update that key. In large clusters with a lot of identity and pod churn, we see this
happening multiple times an hour. It can be investigated by looking at
the "Re-created missing slave key" logline. This often leads to multiple
thousand extra identities in the clusters, that are unused and not
cleaned up.
This can also be easily mitigated by always grabbing the
slaveKeysMutex
- but that is most likely going to cause a lot of slowness to identity allocation on the agents, so that is probably not a good idea.The term
slave
andmaster
key is also a bit confusing, and sometimes its explained asID
andreference
(eg. via acquire reference) - so getting a consistent and better naming scheme would be useful for the readability of the code. We can probably save that for another PR.See the commits for a more detailed explanation.
Please ensure your pull request adheres to the following guidelines:
description and a
Fixes: #XXX
line if the commit addresses a particularGitHub issue.
Fixes: <commit-id>
tag, thenplease add the commit author[s] as reviewer[s] to this issue.
Fixes: #35451