Skip to content

Conversation

HirazawaUi
Copy link
Contributor

@HirazawaUi HirazawaUi commented Mar 24, 2025

When the RunPodSandbox interface fails due to an upstream context cancellation or other reasons, HandleEvent encounters a nil pointer because the PodSandbox.Container field is nil, resulting in a panic.

I discovered this issue in the kubernetes/kubernetes repository.

PR kubernetes/kubernetes#127122 aims to address this by passing the context to the container runtime, resolving the issue where container creation and image pulling actions cannot be canceled.

Previously, the tests in this PR worked fine, but after the CI in the k/k repository switched to containerd 2.0, many tests in this PR started failing.

While investigating the error, I captured the following error in the containerd logs:

Mar 23 14:25:14 kind-worker containerd[15009]: time="2025-03-23T14:25:14.978696862Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:pod-terminate-status-1-4,Uid:ee88607b-7c0e-4c4b-899b-f26c423d7ea1,Namespace:pods-6426,Attempt:0,} failed, error" error="rpc error: code = Unknown desc = failed to start sandbox \"14688f874e1d01af706ebe5dca7e38b1d3678bf6a444bd29a35d5a8cee022541\": failed to create containerd task: failed to create shim task: context canceled"
Mar 23 14:25:14 kind-worker containerd[15009]: time="2025-03-23T14:25:14.980825885Z" level=info msg="TaskExit event in podsandbox handler container_id:\"14688f874e1d01af706ebe5dca7e38b1d3678bf6a444bd29a35d5a8cee022541\"  id:\"14688f874e1d01af706ebe5dca7e38b1d3678bf6a444bd29a35d5a8cee022541\"  pid:196281  exit_status:137  exited_at:{seconds:1742739914  nanos:726692665}"
Mar 23 14:25:14 kind-worker containerd[15009]: time="2025-03-23T14:25:14.981209111Z" level=info msg="received exit event sandbox_id:\"9e49cf94f1b270f368e72a76c249a226623cc959522293f36bc2b65f5272a310\"  exit_status:137  exited_at:{seconds:1742739914  nanos:687639430}"
Mar 23 14:25:14 kind-worker containerd[15009]: panic: runtime error: invalid memory address or nil pointer dereference
Mar 23 14:25:14 kind-worker containerd[15009]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x68 pc=0x55f4396c71fa]
Mar 23 14:25:14 kind-worker containerd[15009]: goroutine 373 [running]:
Mar 23 14:25:14 kind-worker containerd[15009]: github.com/containerd/containerd/v2/internal/cri/server/podsandbox.handleSandboxTaskExit({0x55f439ff5db0, 0xc00042eee0}, 0xc002da8a80, 0xc003d33860)
Mar 23 14:25:14 kind-worker containerd[15009]:         /containerd/internal/cri/server/podsandbox/controller.go:192 +0x3a
Mar 23 14:25:14 kind-worker containerd[15009]: github.com/containerd/containerd/v2/internal/cri/server/podsandbox.(*podSandboxEventHandler).HandleEvent(0xc00007a8c0, {0x55f439ef9e80?, 0xc003d33860})
Mar 23 14:25:14 kind-worker containerd[15009]:         /containerd/internal/cri/server/podsandbox/events.go:55 +0x149
Mar 23 14:25:14 kind-worker containerd[15009]: github.com/containerd/containerd/v2/internal/cri/server/events.(*EventMonitor).Start.func1()
Mar 23 14:25:14 kind-worker containerd[15009]:         /containerd/internal/cri/server/events/events.go:155 +0x422
Mar 23 14:25:14 kind-worker containerd[15009]: created by github.com/containerd/containerd/v2/internal/cri/server/events.(*EventMonitor).Start in goroutine 10
Mar 23 14:25:14 kind-worker containerd[15009]:         /containerd/internal/cri/server/events/events.go:135 +0xb2
Mar 23 14:25:15 kind-worker systemd[1]: containerd.service: Main process exited, code=exited, status=2/INVALIDARGUMENT

@k8s-ci-robot
Copy link

Hi @HirazawaUi. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@samuelkarp
Copy link
Member

FAIL - does not have a valid DCO

Please sign-off your commit.

Signed-off-by: BING HONGTAO <695097494plus@gmail.com>
@HirazawaUi
Copy link
Contributor Author

Please sign-off your commit.

Signed.

@samuelkarp
Copy link
Member

/ok-to-test

@samuelkarp samuelkarp added the cherry-pick/2.0.x Change to be cherry picked to release/2.0 branch label Mar 24, 2025
@github-project-automation github-project-automation bot moved this from Needs Triage to Review In Progress in Pull Request Review Mar 24, 2025
@dims dims added this pull request to the merge queue Mar 24, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 24, 2025
@ningmingxiao
Copy link
Contributor

ningmingxiao commented Mar 24, 2025

why not add check in handleSandboxTaskExit? why sb.Container = nil ? @HirazawaUi

@HirazawaUi
Copy link
Contributor Author

why not add check in handleSandboxTaskExit? why sb.Container = nil ? @HirazawaUi

ref: #11588 (comment)

@ningmingxiao
Copy link
Contributor

ningmingxiao commented Mar 24, 2025

I find will call handleSandboxTaskExit twice, I don't know why. It should be anoter bug.

@HirazawaUi
Copy link
Contributor Author

I find will call handleSandboxTaskExit twice, I don't know why. It should be anoter bug.

In the logs I’ve observed, every panic seems to stem from a call to HandleEvent -> handleSandboxTaskExit, where a panic occurs. Additionally, in all other callers of handleSandboxTaskExit, the Container is never nil. I think handling nil in either HandleEvent or handleSandboxTaskExit would work fine, and as noted in the comment #11588 (comment), it’s really just a matter of code preference.

If you think there’s anything incorrect in my response, I’d greatly appreciate it if you could point it out. That said, I’m a bit puzzled as to why you opened another PR to address this issue, especially when there doesn’t seem to be a clear difference between the two PRs.

@ningmingxiao
Copy link
Contributor

ningmingxiao commented Mar 24, 2025

I close my pr,it should be a nother bug

@yylt
Copy link
Contributor

yylt commented Mar 26, 2025

Will this cover up other issues? In fact, this paragraph was moved from https://github.com/containerd/containerd/blob/v1.7.27/pkg/cri/server/sandbox_run.go#L196C1-L205C5

	defer func() {
		// Delete container only if all the resource cleanup is done.
		if retErr != nil && cleanupErr == nil {
			deferCtx, deferCancel := util.DeferContext()
			defer deferCancel()
			if cleanupErr = container.Delete(deferCtx, containerd.WithSnapshotCleanup); cleanupErr != nil {
				log.G(ctx).WithError(cleanupErr).Errorf("Failed to delete containerd container %q", id)
			}
+      podSandbox.Container = nil
		}
	}()

But after the move, a nil was added. I don't know why this sentence was added here, but it looks different from the original processing.

If we make changes here, may similar problems actually occur in other unknown places?

@HirazawaUi
Copy link
Contributor Author

@estesp It seems some test job failed due to some unrelated reasons, preventing this PR from being successfully merged. Is there anything I can do at this point to help? Or should I wait for other maintainers to re-add it to merge queue?

@mxpv mxpv added this pull request to the merge queue Mar 30, 2025
Merged via the queue into containerd:main with commit 009d8be Mar 30, 2025
59 checks passed
@github-project-automation github-project-automation bot moved this from Review In Progress to Done in Pull Request Review Mar 30, 2025
@brandond
Copy link
Contributor

brandond commented Jul 1, 2025

Why hasn't this been backported to the 2.0 branch yet?

@HirazawaUi
Copy link
Contributor Author

Why hasn't this been backported to the 2.0 branch yet?

Thank you for the reminder, I’ll cherry-pick this to the 2.0 branch.

@HirazawaUi
Copy link
Contributor Author

@brandond PR has been opened.

ref: #12047

@austinvazquez austinvazquez added cherry-picked/2.0.x PR commits are cherry picked into the release/2.0 branch and removed cherry-pick/2.0.x Change to be cherry picked to release/2.0 branch labels Jul 1, 2025
@brandond
Copy link
Contributor

brandond commented Jul 1, 2025

Thanks. How about getting #11576 / b9ab7a3 merged and backported too? We've had this pulled into our fork for months, containerd is basically unusable without it for a large subset of our users.

It looks like 2.0 hasn't seen a patch release tagged since 2.1 came out, is this intentional? @austinvazquez @estesp

@estesp
Copy link
Member

estesp commented Jul 1, 2025

It looks like 2.0 hasn't seen a patch release tagged since 2.1 came out, is this intentional?

Definitely not intentional. We attempt to properly mark PRs for cherry-pick that are bug fixes, but we are definitely imperfect. I think having two 2.x lines of support has made things a bit more challenging lately, but hopefully we can make sure to get 2.0.x in better shape and cut a release soon.

We tend to be driven by customer demand on minor/point releases, so if there is ever a time you feel like a release is warranted please open an issue or ping us on #containerd-dev Slack channel.

brandond pushed a commit to brandond/containerd that referenced this pull request Jul 1, 2025
Fix the panic caused by the failure of RunPodSandbox

(cherry picked from commit 009d8be)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
brandond added a commit to brandond/k3s that referenced this pull request Jul 11, 2025
Adds backports of containerd/containerd#11588 and containerd/containerd#12053

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
brandond added a commit to brandond/k3s that referenced this pull request Jul 11, 2025
Adds backports of containerd/containerd#11588 and containerd/containerd#12053

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
brandond added a commit to brandond/k3s that referenced this pull request Jul 11, 2025
Adds backports of containerd/containerd#11588 and containerd/containerd#12053

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
brandond added a commit to k3s-io/k3s that referenced this pull request Jul 11, 2025
Adds backports of containerd/containerd#11588 and containerd/containerd#12053

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
brandond added a commit to brandond/k3s that referenced this pull request Jul 11, 2025
Adds backports of containerd/containerd#11588 and containerd/containerd#12053

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 0d48c8e)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
brandond added a commit to brandond/k3s that referenced this pull request Jul 14, 2025
Adds backports of containerd/containerd#11588 and containerd/containerd#12053

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 0d48c8e)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
brandond added a commit to brandond/k3s that referenced this pull request Jul 14, 2025
Adds backports of containerd/containerd#11588 and containerd/containerd#12053

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 0d48c8e)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
brandond added a commit to k3s-io/k3s that referenced this pull request Jul 15, 2025
Adds backports of containerd/containerd#11588 and containerd/containerd#12053

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 0d48c8e)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
brandond added a commit to k3s-io/k3s that referenced this pull request Jul 15, 2025
Adds backports of containerd/containerd#11588 and containerd/containerd#12053

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 0d48c8e)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
brandond added a commit to k3s-io/k3s that referenced this pull request Jul 15, 2025
Adds backports of containerd/containerd#11588 and containerd/containerd#12053

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 0d48c8e)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cri Container Runtime Interface (CRI) cherry-picked/2.0.x PR commits are cherry picked into the release/2.0 branch kind/bug ok-to-test size/XS
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.