-
Notifications
You must be signed in to change notification settings - Fork 3.4k
[v1.13] Delete IP Label metadata on delete from ipcache #27010
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This change breaks in 1.13, converting to draft |
f2bc2ed
to
f0f056b
Compare
/test |
I think there should be a call to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f0f056b
to
c8bcbac
Compare
I modified RemoveMetadata to remove all entries of an IP instead of a select set, the function was not used anywhere so did not cause any issues. |
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My previous review comment still applies.
e972a4f
to
4cdca3e
Compare
/test-backport-1.13 Job 'Cilium-PR-K8s-1.17-kernel-4.19' failed: Click to show.Test Name
Failure Output
Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.17-kernel-4.19/98/ If it is a flake and a GitHub issue doesn't already exist to track it, comment Then please upload the Jenkins artifacts to that issue. |
Thrown away all work and started fresh given the architecture is way different than 1.12 :) |
@meyskens heads up that I ended up preparing a subtly different fix on v1.12 here, may be worth comparing for the approach on v1.13: #27406. I also extended your testing script here, which has two tests that now pass and two tests that still fail on all branches. For this bugfix it's fine to fix the first two test cases, then we can separately take a look at the second two test cases. I know that there's more differences with v1.13 and this code has been heavily changing over time, so we'll need to be a bit careful around how we introduce the changes. |
4cdca3e
to
e312165
Compare
This pull request has been automatically marked as stale because it |
Commit d564cb947bc65aa883504fe36c610301e164783c does not match "Signed-off-by". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
d564cb9
to
e312165
Compare
This commit is based on v1.12 changes made in 067523f It has the same added function calls but refactored to work with the resource ID based metadata cache of v1.13. Previously the node manager code only ever performed upserts into the ipcache for remote-node labels on node creation/update events. However, it never cleaned up these labels upon delete. As a result, if a node was ever removed, then Cilium would continue to consider traffic towards that IP as traffic reaching towards a remote node. In extreme cases, this IP could be reallocated to another host in the network which could cause connectivity disruption to that peer, particularly if the new peer is subject to CIDR or Entities based policies. Signed-off-by: Maartje Eyskens <maartje.eyskens@isovalent.com>
e312165
to
c8eed5e
Compare
} | ||
|
||
for _, address := range []net.IP{ | ||
entry.node.IPv4HealthIP, entry.node.IPv6HealthIP, | ||
entry.node.IPv4IngressIP, entry.node.IPv6IngressIP, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the node gets deleted, this code still needs to remove the association for the old health and ingress IPs. Was it intentional to remove the iteration here for these additional addresses?
resource := ipcacheTypes.NewResourceID(ipcacheTypes.ResourceKindNode, "", n.Name) | ||
|
||
m.removeFromIDMD(prefix, remoteHostIdentity, resource) | ||
m.ipcache.Delete(ip, n.Source) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be using the non-prefix representation of the string like 192.0.2.3
, as compared with the prefix representation like 192.0.2.3/32
as like the call six lines up. However, NodeUpdated()
doesn't seem to do an ipcache.Upsert(...)
with this representation. So this seems to be deleting an association in the ipcache that was never created when the node was initially discovered? I don't think that this is correct.
I'm turning the PR to draft, please mark as ready again once it's unblocked and ready to move forwards. |
Superseded by #28972 |
v1.13 backport of #26958