Skip to content

Conversation

meyskens
Copy link
Contributor

v1.13 backport of #26958

Delete IP Label metadata on delete from ipcache

@maintainer-s-little-helper maintainer-s-little-helper bot added backport/1.13 kind/backports This PR provides functionality previously merged into master. labels Jul 24, 2023
@meyskens meyskens changed the title Delete IP Label metadata on delete from ipcache [v1.13] Delete IP Label metadata on delete from ipcache Jul 24, 2023
@meyskens meyskens marked this pull request as ready for review July 24, 2023 07:13
@meyskens meyskens requested a review from a team as a code owner July 24, 2023 07:13
@meyskens meyskens requested review from a team and nathanjsweet and removed request for a team July 24, 2023 07:15
@meyskens
Copy link
Contributor Author

This change breaks in 1.13, converting to draft

@meyskens meyskens marked this pull request as draft July 24, 2023 07:16
@meyskens meyskens force-pushed the meyskens/113-ipcache-meta-delete branch 2 times, most recently from f2bc2ed to f0f056b Compare July 24, 2023 08:23
@meyskens meyskens marked this pull request as ready for review July 26, 2023 20:29
@meyskens
Copy link
Contributor Author

/test

@christarazi
Copy link
Member

I think there should be a call to RemoveLabels in (m *Manager) deleteIPCache to be symmetric with (m *Manager) upsertIntoIDMD in pkg/node/manager/manager.go because that will properly clean up the metadata map. The RemoveLabels function does not exist in v1.12 version btw.

Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@meyskens meyskens force-pushed the meyskens/113-ipcache-meta-delete branch from f0f056b to c8bcbac Compare July 27, 2023 10:35
@meyskens
Copy link
Contributor Author

I modified RemoveMetadata to remove all entries of an IP instead of a select set, the function was not used anywhere so did not cause any issues.

@meyskens meyskens requested a review from christarazi July 27, 2023 11:15
@meyskens
Copy link
Contributor Author

/test

Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My previous review comment still applies.

@meyskens meyskens marked this pull request as draft August 1, 2023 07:44
@meyskens meyskens force-pushed the meyskens/113-ipcache-meta-delete branch 2 times, most recently from e972a4f to 4cdca3e Compare August 1, 2023 11:43
@meyskens
Copy link
Contributor Author

meyskens commented Aug 1, 2023

/test-backport-1.13

Job 'Cilium-PR-K8s-1.17-kernel-4.19' failed:

Click to show.

Test Name

K8sAgentPolicyTest Multi-node policy test with L7 policy using connectivity-check to check datapath

Failure Output

FAIL: connectivity-check pods are not ready after timeout

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.17-kernel-4.19/98/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.17-kernel-4.19 so I can create one.

Then please upload the Jenkins artifacts to that issue.

@meyskens
Copy link
Contributor Author

meyskens commented Aug 1, 2023

Thrown away all work and started fresh given the architecture is way different than 1.12 :)

@meyskens meyskens marked this pull request as ready for review August 1, 2023 12:16
@meyskens meyskens requested a review from christarazi August 1, 2023 12:23
@nebril nebril mentioned this pull request Aug 9, 2023
2 tasks
@joestringer
Copy link
Member

@meyskens heads up that I ended up preparing a subtly different fix on v1.12 here, may be worth comparing for the approach on v1.13: #27406. I also extended your testing script here, which has two tests that now pass and two tests that still fail on all branches. For this bugfix it's fine to fix the first two test cases, then we can separately take a look at the second two test cases.

I know that there's more differences with v1.13 and this code has been heavily changing over time, so we'll need to be a bit careful around how we introduce the changes.

@meyskens meyskens force-pushed the meyskens/113-ipcache-meta-delete branch from 4cdca3e to e312165 Compare August 29, 2023 11:01
@meyskens meyskens marked this pull request as draft August 29, 2023 11:07
@github-actions
Copy link

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Sep 29, 2023
@christarazi christarazi removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Sep 29, 2023
@maintainer-s-little-helper
Copy link

Commit d564cb947bc65aa883504fe36c610301e164783c does not match "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Oct 2, 2023
@meyskens meyskens force-pushed the meyskens/113-ipcache-meta-delete branch from d564cb9 to e312165 Compare October 2, 2023 10:17
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Oct 2, 2023
This commit is based on v1.12 changes made in
067523f
It has the same added function calls but refactored to work with the
resource ID based metadata cache of v1.13.

Previously the node manager code only ever performed upserts into the
ipcache for remote-node labels on node creation/update events. However,
it never cleaned up these labels upon delete. As a result, if a node was
ever removed, then Cilium would continue to consider traffic towards
that IP as traffic reaching towards a remote node. In extreme cases,
this IP could be reallocated to another host in the network which could
cause connectivity disruption to that peer, particularly if the new peer
is subject to CIDR or Entities based policies.

Signed-off-by: Maartje Eyskens <maartje.eyskens@isovalent.com>
@meyskens meyskens force-pushed the meyskens/113-ipcache-meta-delete branch from e312165 to c8eed5e Compare October 2, 2023 10:17
@meyskens meyskens marked this pull request as ready for review October 2, 2023 10:25
Comment on lines -654 to -658
}

for _, address := range []net.IP{
entry.node.IPv4HealthIP, entry.node.IPv6HealthIP,
entry.node.IPv4IngressIP, entry.node.IPv6IngressIP,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the node gets deleted, this code still needs to remove the association for the old health and ingress IPs. Was it intentional to remove the iteration here for these additional addresses?

resource := ipcacheTypes.NewResourceID(ipcacheTypes.ResourceKindNode, "", n.Name)

m.removeFromIDMD(prefix, remoteHostIdentity, resource)
m.ipcache.Delete(ip, n.Source)
Copy link
Member

@joestringer joestringer Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be using the non-prefix representation of the string like 192.0.2.3, as compared with the prefix representation like 192.0.2.3/32 as like the call six lines up. However, NodeUpdated() doesn't seem to do an ipcache.Upsert(...) with this representation. So this seems to be deleting an association in the ipcache that was never created when the node was initially discovered? I don't think that this is correct.

@qmonnet
Copy link
Member

qmonnet commented Oct 16, 2023

I'm turning the PR to draft, please mark as ready again once it's unblocked and ready to move forwards.

@qmonnet qmonnet marked this pull request as draft October 16, 2023 09:07
@mathpl mathpl added the area/agent Cilium agent related. label Oct 26, 2023
@christarazi
Copy link
Member

Superseded by #28972

@christarazi christarazi closed this Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/agent Cilium agent related. kind/backports This PR provides functionality previously merged into master.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants