ipcache: move CIDR restoration to asynchronous APIs #28673

squeed · 2023-10-18T10:29:53Z

This PR contains a key insight: we don't need to insert restored CIDRs directly in to the ipcache. Rather, we just need to ensure their identities are not used by any other prefixes.

This PR accomplishes this by adding a new identity reservation mechanism in to the local allocator. It then plumbs up the ability to request a reserved (i.e. restored) identity when inserting in to the ipcache. The primary benefit of this approach is that we don't have to try and back-fill labels based on a dump of the ipcache. Rather, we just reserve already-existing numeric identities and let all the usual ipcache writers do their thing. When the initial k8s sync is finished and ipcache writing is allowed, then all the restored identities should have the exact same labels they had before agent restart - and all without any fancy logic!

This PR has one significant immediate benefit: the reserved:kube-apiserver label no longer flaps on restart 🎉 .

The previous flow:

Dump the ipcache, listing (prefix -> id) pairs.
Upsert that prefix -> id directly in the ipcache, guessing at the set of labels (cidr:xxxx, ... reserved:world). At this point, a numeric ID that previously had the kube-apiserver label has now lost it.
Perform a k8s sync, adding the reserved:kube-apiserver label back to the IP.
Do an ipcache.UpsertLabels(), which causes the kube-apiserver cidr to allocate a new numeric ID. This is because the set of labels allocated for the cidr didn't include the kube-apiserver label.

The new flow:

Dump the ipcache, reserving all IDs from allocation
Upsert the prefix in to the ipcache metadata layer, tagging it as having a restored identity.
Perform a k8s sync, adding in all relevant metadata / labels
Do an ipcache.UpsertLabels(), which will allocate an identity with the same set of labels as before, and use the previous numeric ID for that prefix.

As a bonus, we can remove the special hack for re-creating node-cidr labels. That was added in resolveIdentity() but it really didn't belong there.

If we decide this is the right way forward, outstanding items are:

Add test that ensures identities remain constant after restarts
Clean up code comments
Test extensively with FQDN policies

squeed · 2023-10-18T10:31:15Z

I had an idea for how we might do this, and it only took a few hours to sketch out and implement. Looking for feedback, especially from @joestringer @jrajahalme

squeed · 2023-10-18T12:43:49Z

/test

squeed · 2023-10-19T08:19:00Z

Not that this is ready to go, but CI is green :-)

joestringer

Seems like a reasonable approach in general, though I suspect in the details the current draft isn't quite working as intended due to the lack of unreserve before allocation, see feedback below.

I might suggest splitting the identity allocation logic & tests into a dedicated commit for easier subsequent review & to keep small commits, but that's as much personal preference as anything.

daemon/cmd/daemon.go

daemon/cmd/daemon_main.go

pkg/ipcache/metadata.go

pkg/identity/cache/local.go

squeed · 2023-10-26T13:48:02Z

@joestringer I updated the PR based on your feedback. No big changes, just some small cleanups. The biggest change is explicitly managing reservation lifetime.

I also renamed the metadata-bits from RestoredIdentity to RequestedIdentity to better match semantics.

squeed · 2023-10-26T13:48:21Z

/test

squeed · 2023-10-30T12:25:33Z

/test

pkg/ipcache/ipcache.go

squeed · 2023-11-01T12:02:25Z

/test

joestringer

I went back through, logically it seems sound to me. I spaced out a little bit going through the main restoreIPCache() logic patch due to the combination of refactor + logical change, so that could do with a revisit. I did notice some discrepancies there that need to be addressed, and ideally that should be split into two patches. Other than that it's mostly minor nits around function comments and logs, and one unnecessary goroutine. The corner case handling of identity allocation from the reserved range when under identity pressure wasn't entirely clear to me either. We can discuss further in the threads below.

pkg/ipcache/ipcache.go

pkg/identity/cache/local.go

pkg/ipcache/cidr.go

daemon/cmd/daemon.go

daemon/cmd/daemon_main.go

daemon/cmd/ipcache.go

squeed · 2023-11-06T16:01:38Z

/test

christarazi

I was stress-testing this PR with FQDN proxy and found a nasty bug where prefixes were incorrectly removed from the BPF ipcache while still in use with FQDN. The problem fqdn proxy does not do an UpsertGeneratedIdentities() if an identity is not new, and thus the source of a prefix is not "upgraded".

Luckily, it was an easy fix: since the set of all referenced identities is passed to UpsertGeneratedIdentities(), we can overwrite createdFromMetadata retroactively and prevent deletion.

I've added a new commit and test case for this scenario.

Nice find. A couple of questions:

What does the stress testing look like? I think it would be valuable to upstream it at some point so when we have an infrastructure for stress testing in the future, we have the FQDN proxy covered.
Does this bug exist in main or was it something that arose from what this PR is solving? If it's the former, then I would suggest a separate PR so that it can be backported separately from this PR. Additionally, do we have an idea on how we missed it before?

test/helpers/kubectl.go

squeed · 2023-11-06T19:34:37Z

What does the stress testing look like? I think it would be valuable to upstream it at some point so when we have an infrastructure for stress testing in the future, we have the FQDN proxy covered.

Nothing exciting, just ensuring that fqdn policies still work while spamming restarts. We have a test case that covers this already, it just needs to be updated to wait for the restore grace period to stop.

Does this bug exist in main or was it something that arose from what this PR is solving? If it's the former, then I would suggest a separate PR so that it can be backported separately from this PR. Additionally, do we have an idea on how we missed it before?

I don't think so. The specific interaction that's problematic is

UpsertMetadata
AllocateCIDRs & UpsertGeneratedIdentities
RemoveMetadata

Moving CIDR restoration over to the new API is what triggered this. That said, I'm sure there's some permutation that I'm not thinking of, so it can't hurt to split out.

squeed · 2023-11-06T19:35:43Z

Test failures look interesting. One of them is a bug in my new test, the other is ensuring fqdn works after restart. The latter is failing, which is concerning. Works On My Machine, of course.

squeed · 2023-11-06T20:20:47Z

Found another missing case where we were stepping on toes, this time when allocating identities after a restart. Phew. That should fix it.

squeed · 2023-11-06T20:20:58Z

/test

squeed · 2023-11-08T09:03:29Z

CI was green except a broken Travis. Rebased on main.

squeed · 2023-11-08T09:16:51Z

/test

christarazi

Just one nit to fix up.

pkg/ipcache/ipcache.go

They do the same thing as their scalar equivalents, but can perform updates to multiple prefixes with a single lock, which saves contention. Also, batch up calls to enqueuePrefixUpdates() where relevant. Signed-off-by: Casey Callendrello <cdc@isovalent.com>

This allows local numeric identities to be held out of the pool for allocation. Instead, they can only be used when explicitly requested, e.g. when supplied via the oldNID parameter to Allocate() The purpose of this is to allow for a stable prefix -> nID mapping, even on agent restarts. As part of the startup process, existing identities will be reserved so that existing prefixes can "claim" them (in a subsequent commit). Signed-off-by: Casey Callendrello <cdc@isovalent.com>

This adds an additional ipcache metadata field: the preferred numeric identity for a prefix. The local allocators already allow for requesting a given numeric ID for an identity (set of labels). This is useful when restoring prefix -> identity mappings on daemon restart, where we would like to end up with the same ipcache state as before the restart. When allocating an identity for a prefix with RestoredIdentity, the ipcache will now request that identity. The allocators already handle the case where a requested numeric ID is taken -- the request is merely ignored. So this is always safe to do. Likewise, if a set of labels is already allocated, the requested ID is ignored. Signed-off-by: Casey Callendrello <cdc@isovalent.com>

Now that we can withhold and request specific numeric identities, we can transition the ipcache restoration logic over. This change makes it so that prefixes request whatever previous numeric identity they had on restart. Once the watchers are synchronized, the ipcache will proceed with label injection, and the state should exactly match that what was before. Additionally, update some tests that mimicked the now-removed restoration logic. Signed-off-by: Casey Callendrello <cdc@isovalent.com>

This parameter is no longer needed, so let's reduce the confusion surface. Signed-off-by: Casey Callendrello <cdc@isovalent.com>

This adds an additional check to an existing test that ensures local (cidr) identities are stable after agent restart. Signed-off-by: Casey Callendrello <cdc@isovalent.com>

squeed · 2023-11-08T19:06:07Z

/test

squeed · 2023-11-09T09:48:52Z

All approvals in, only one non-required CI job failing, MLH has added ready-to-merge. Merging.

squeed requested a review from joestringer October 18, 2023 10:29

maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Oct 18, 2023

squeed requested a review from jrajahalme October 18, 2023 12:47

joestringer mentioned this pull request Oct 20, 2023

Convert daemon identity restore logic over to newer IPCache APIs #27255

Closed

11 tasks

joestringer reviewed Oct 24, 2023

View reviewed changes

daemon/cmd/daemon.go Outdated Show resolved Hide resolved

daemon/cmd/daemon_main.go Outdated Show resolved Hide resolved

pkg/ipcache/metadata.go Outdated Show resolved Hide resolved

pkg/identity/cache/local.go Outdated Show resolved Hide resolved

joestringer added the release-note/misc This PR makes changes that have no direct user impact. label Oct 24, 2023

maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Oct 24, 2023

joestringer mentioned this pull request Oct 24, 2023

Rework ipcache to handle metadata from multiple sources via a dedicated worker goroutine #21142

Open

29 tasks

squeed force-pushed the cidr-restore-refactor branch from f8c0dcd to 9c71941 Compare October 26, 2023 12:45

squeed force-pushed the cidr-restore-refactor branch from 9c71941 to 8c0c7e8 Compare October 30, 2023 11:41

christarazi reviewed Oct 30, 2023

View reviewed changes

pkg/ipcache/ipcache.go Outdated Show resolved Hide resolved

squeed self-assigned this Oct 31, 2023

squeed force-pushed the cidr-restore-refactor branch from 8c0c7e8 to 5a0e3b8 Compare October 31, 2023 20:29

squeed marked this pull request as ready for review October 31, 2023 20:29

squeed requested review from a team as code owners October 31, 2023 20:29

squeed requested a review from christarazi October 31, 2023 20:29

christarazi reviewed Oct 31, 2023

View reviewed changes

pkg/ipcache/ipcache.go Show resolved Hide resolved

squeed force-pushed the cidr-restore-refactor branch from 5a0e3b8 to 63f584a Compare November 1, 2023 08:54

joestringer requested changes Nov 1, 2023

View reviewed changes

squeed force-pushed the cidr-restore-refactor branch from 7136bcd to 1720bbf Compare November 6, 2023 15:49

squeed removed the request for review from jrajahalme November 6, 2023 16:15

christarazi reviewed Nov 6, 2023

View reviewed changes

test/helpers/kubectl.go Show resolved Hide resolved

squeed force-pushed the cidr-restore-refactor branch from 1720bbf to 15e2d35 Compare November 6, 2023 20:15

squeed mentioned this pull request Nov 6, 2023

ipcache: keep upserted prefixes from being deleted by InjectLabels #29014

Merged

squeed force-pushed the cidr-restore-refactor branch from 15e2d35 to 97f18f9 Compare November 8, 2023 09:03

maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 8, 2023

christarazi approved these changes Nov 8, 2023

View reviewed changes

pkg/ipcache/ipcache.go Outdated Show resolved Hide resolved

maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 8, 2023

squeed added 6 commits November 8, 2023 20:05

ipcache: remove oldNID from AllocateCIDRs

a4aa30c

This parameter is no longer needed, so let's reduce the confusion surface. Signed-off-by: Casey Callendrello <cdc@isovalent.com>

test: check that CIDR identities are stable after agent restart

9f2927a

This adds an additional check to an existing test that ensures local (cidr) identities are stable after agent restart. Signed-off-by: Casey Callendrello <cdc@isovalent.com>

squeed force-pushed the cidr-restore-refactor branch from 97f18f9 to 9f2927a Compare November 8, 2023 19:05

squeed merged commit ee810d7 into cilium:main Nov 9, 2023

ipcache: move CIDR restoration to asynchronous APIs #28673

ipcache: move CIDR restoration to asynchronous APIs #28673

Uh oh!

Conversation

squeed commented Oct 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

squeed commented Oct 18, 2023

Uh oh!

squeed commented Oct 18, 2023

Uh oh!

squeed commented Oct 19, 2023

Uh oh!

joestringer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

squeed commented Oct 26, 2023

Uh oh!

squeed commented Oct 26, 2023

Uh oh!

squeed commented Oct 30, 2023

Uh oh!

Uh oh!

Uh oh!

squeed commented Nov 1, 2023

Uh oh!

joestringer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

squeed commented Nov 6, 2023

Uh oh!

christarazi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

squeed commented Nov 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

squeed commented Nov 6, 2023

Uh oh!

squeed commented Nov 6, 2023

Uh oh!

squeed commented Nov 6, 2023

Uh oh!

squeed commented Nov 8, 2023

Uh oh!

squeed commented Nov 8, 2023

Uh oh!

christarazi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

squeed commented Nov 8, 2023

Uh oh!

squeed commented Nov 9, 2023

Uh oh!

Uh oh!

squeed commented Oct 18, 2023 •

edited

Loading

squeed commented Nov 6, 2023 •

edited

Loading