-
Notifications
You must be signed in to change notification settings - Fork 3.6k
CRI: Properly restore IP information for userns sandboxes #10400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Skipping CI for Draft Pull Request. |
b2113dc
to
51dbc98
Compare
/retest |
51dbc98
to
c148cc0
Compare
c56d3d0
to
7bf5940
Compare
12fe520
to
b5a0bad
Compare
b5a0bad
to
82fdfad
Compare
@fuweid Sorry for delay, un-drafted this and added an integration test. Only some of the CI machines support userns in full so I added a log so we can grep for if the testcase is actually being hit. "restart_linux_test.go:36: adding userns pods for containerd restart test" |
82fdfad
to
0feb455
Compare
/retest |
Fixes: containerd#10363 Due to when we setup networking for userns containers (AFTER the OCI runtime has ran), this was causing the underlying object for the sandbox to not have the IP information saved in the metadata extension. Because we're trying to move away from modifying the underlying container object for sandboxes (or even just keeping around a reference to it), I don't want to update the underlying container afterwards to include this. So, to remedy this we can update the sandboxes metadata after network setup runs for userns containers, and then our recovery logic just needs to be altered a bit. Our recovery logic already checks containers with the sandbox label first, and then checks our shiny new sandbox store. Our IP information is stored in the sandbox store objects, so all we need to do is check if it exists, and if so and we already found a container for it (although without the IP information) then just update the sandbox in our in-memory store. Signed-off-by: Danny Canter <danny@dcantah.dev>
Add a new case to TestContainerdRestartSandboxRecover that will launch a userns pod if the host and runtime supports it. The new case will just test that we have networking info in the pod metadata on restart. Signed-off-by: Danny Canter <danny@dcantah.dev>
0feb455
to
3f9c70c
Compare
@dcantah: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
@@ -358,6 +350,15 @@ func (c *criService) RunPodSandbox(ctx context.Context, r *runtime.RunPodSandbox | |||
sandboxCreateNetworkTimer.UpdateSince(netStart) | |||
} | |||
|
|||
if err := sandboxInfo.AddExtension(podsandbox.MetadataKey, &sandbox.Metadata); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should move these two updates AddExtension
and SandboxStore.Update
into the end of if !hostNetwork(config) && userNsEnabled
because non-user-ns case doesn't need to update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me. When I'm by my computer I'll see if there's any gotchas
@@ -81,7 +82,7 @@ func (c *criService) recover(ctx context.Context) error { | |||
return nil | |||
} | |||
log.G(ctx2).Debugf("Loaded sandbox %+v", sb) | |||
if err := c.sandboxStore.Add(sb); err != nil { | |||
if err := c.sandboxStore.Add(sb); err != nil && !errors.Is(err, errdefs.ErrAlreadyExists) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it related to this fix?
@@ -100,12 +101,18 @@ func (c *criService) recover(ctx context.Context) error { | |||
return fmt.Errorf("failed to list sandboxes from API: %w", err) | |||
} | |||
for _, sbx := range storedSandboxes { | |||
if _, err := c.sandboxStore.Get(sbx.ID); err == nil { | |||
sb, err := c.sandboxStore.Get(sbx.ID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: we should merge metadata-db.sandbox-store.metadata with metadata-db.container.metadata in function podSandboxLoader.RecoverContainer
. This loop is designed for the sandboxes which are managed by remote sandbox plugin. I think we should not involve two sources in one loop.
In the RecoverContainer
function, we should prefer to use metadata from metadata-db.sandbox. What do you think?
I found that the sandboxes bucket wasn't in v1
. Should we add the version before namespace for sandbox?
#10467
containerd/core/metadata/buckets.go
Lines 309 to 315 in 67a0efc
func getSandboxBucket(tx *bolt.Tx, namespace string) *bolt.Bucket { | |
return getBucket( | |
tx, | |
[]byte(namespace), | |
bucketKeyObjectSandboxes, | |
) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but that seemed a big shift for this PR. This "use the underlying container for the given sandbox" vs "use the sandbox api as the source of truth to grab the sandboxes" is very confusing atm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the underlying container for the given sandbox
It's to recover sandbox created by previous release actually.
but that seemed a big shift for this PR
OK. What about adding a condition that if IP
is empty, we should use metadata from sandbox bucket 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can give that a try!
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Removing from v2.0 milestone as staled |
This PR is stale because it has been open 90 days with no activity. This PR will be closed in 7 days unless new comments are made or the stale label is removed. |
This PR was closed because it has been stalled for 7 days with no activity. |
Fixes: #10363
Due to when we setup networking for userns containers (AFTER the OCI runtime has ran), this was causing the underlying object for the sandbox to not have the IP information saved in the metadata extension. Because we're trying to move away from modifying the underlying container object for sandboxes (or even just keeping around a reference to it), I don't want to update the underlying container afterwards to include this. So, to remedy this we can update the sandboxes metadata after network setup runs for userns containers, and then our recovery logic just needs to be altered a bit.
Our recovery logic already checks containers with the sandbox label first, and then checks our shiny new sandbox store. Our IP information is stored in the sandbox store objects, so all we need to do is check if it exists, and if so and we already found a container for it (although without the IP information) then just update the sandbox in our in-memory store.