-
Notifications
You must be signed in to change notification settings - Fork 3.4k
[v1.12] Remove remote-node labels from ipcache on node delete #27406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This reverts commit f9fe611. The lock acquisition ordering in this original solution to the problem could trigger deadlocks when the ipcache is concurrently updated by multiple subsystems. Signed-off-by: Joe Stringer <joe@cilium.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix looks good to me.
Did you consider whether we need to also call RemoveMetadata
in the NodeUpdated
case which handles an update (old node -> new node) for the old node? The line I'm thinking about is here.
437bcaf
to
5550013
Compare
Hmm, at a glance it looks like we should add it there too. I have a reproducer via Maartje for the "delete node" case, but I'm not sure I understand how to reassign IPs for existing nodes in order to exercise that path. |
5550013
to
8468e03
Compare
The easiest thing that comes to mind is manually doing a |
/test-backport-1.12 |
8468e03
to
3572d3f
Compare
Previously the node manager code only ever performed upserts into the ipcache for remote-node labels on node creation/update events. However, it never cleaned up these labels upon delete. As a result, if a node was ever removed, then Cilium would continue to consider traffic towards that IP as traffic reaching towards a remote node. In extreme cases, this IP could be reallocated to another host in the network which could cause connectivity disruption to that peer, particularly if the new peer is subject to CIDR or Entities based policies. Signed-off-by: Joe Stringer <joe@cilium.io>
3572d3f
to
067523f
Compare
/test-backport-1.12 |
/test-1.16-4.9 |
/test-1.21-4.9 |
func (ipc *IPCache) RemoveMetadata(prefix string, lbls labels.Labels, src source.Source) { | ||
ipc.metadata.Lock() | ||
ipc.Lock() | ||
_ = ipc.removeLabels(prefix, lbls, src) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heads up that this solves cases where there is a clear disassociation of the node first (eg node is deleted from k8s) before associating the node with another resource like a service, fqdn or kube-apiserver.
In order to solve the other order, we need to handle the resulting labels from this function. I think it's reasonable to review this PR as-is since this at least fixes two bugs (deadlock + deletion of the node followed by association with another resource), then we can follow up on the other case.
Previously the node manager code only ever performed upserts into the
ipcache for remote-node labels on node creation/update events. However,
it never cleaned up these labels upon delete. As a result, if a node was
ever removed, then Cilium would continue to consider traffic towards
that IP as traffic reaching towards a remote node. In extreme cases,
this IP could be reallocated to another host in the network which could
cause connectivity disruption to that peer, particularly if the new peer
is subject to CIDR or Entities based policies.
Commits:
Related: #26958
Fixes: #27382
Passes the first two test cases here:
https://gist.github.com/joestringer/d7f55da5dd2f33be6148dd52a96a4a54