Introduce the concept of "Sandboxer" #7502

abel-von · 2022-10-10T13:18:49Z

Hi, I think it is really an excellent idea to introduce the sandbox api to containerd by #4131, so I followed every codes changes about this feature. Also I had some ideas about this feature.

I think we can introduce the concept of "Sandboxer" like "Snapshotter" to containerd, every the sandbox management should specify a "Sandboxer". we can provide different kinds of sandboxers, such as hypervisor based sandboxer, cgroup/ns based sandboxer like runc, and wasm based sandboxer that may appear in the future.

I think with this "Sandboxer" introduced, we can completely remove the "shim". Right now we have to launch a shim process for every pod, which is an overhead that can bot be ignored. but with sandboxer, the shim process may be removed.

So I did some modification of the sandbox api codes, and it worked for runc container without shim process(the container status and io for containers can be recovered after containerd restart), I also constructed our own hypervisor based sandboxer and it also worked.

To make this containerd running, we has to prepare a "pause" executable file, move it in /bin folder as it is the default location for runc sandboxer to find the pause executable. we do this to remove the dependency of Snapshotter for Sandboxer, and we don't need a pause image to start a sandbox anymore. then we did some modification in the config file of containerd

[plugins.cri.containerd.runtimes.runc]
  runtime_type="io.containerd.runc.v2"
  sandboxer="linux-sandbox"

Then we can create sandboxes and containers by crictl or ctr. The sandboxes subcommand is introduced for ctr, and there is a parameter --sandboxer for it, default value is the embeded linux-sandbox

You can see that there is no more shim process for containers while the recovery or container status and io is still working (except for tty io).

The details is explained in this doc:
https://docs.google.com/document/d/17_URs9UV3s_7xbaXCLYrCJf-0LhKLOD-NPKCsVdFpGM/edit

k8s-ci-robot · 2022-10-10T13:19:00Z

Hi @abel-von. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mxpv

Thanks for opening this!
I have lots of questions :)

mxpv · 2022-10-12T16:46:32Z

api/services/sandbox/v1/sandbox.proto

+	rpc Pause (ControllerPauseRequest) returns (google.protobuf.Empty);
+	rpc Resume (ControllerResumeRequest) returns (google.protobuf.Empty);
+	rpc Update (ControllerUpdateRequest) returns (ControllerUpdateResponse);
+	rpc AppendContainer (ControllerAppendContainerRequest) returns (ControllerAppendContainerResponse);


Can you explain a bit more here?

In current implementation we use tasks to run sandboxed containers (with sandbox_id to specify which sandbox instance this container belongs to), this way we have task service to control a specific container's lifecycle.

Do we stil use task service?
Why do we want Controller to handle containers?
How to manage container's lifecycle?

Sandboxer is much like a Snapshotter, it is for sandbox management, so when call CRI api RunPodSandbox, containerd just call Sandboxer to start a new sandbox and return, it will no longer create a pause container to imitate a sandbox(only in runc senario that we have to create a pause container to provide the sandbox namespaces, but this can be also done in the "runc sandboxer" now, containerd still do not know the existence of a pause container anymore). Every Sandbox exposes its Task API through "TaskAddress", which is either a unix domain socket or a vsock, or event a TCP address maybe. When we call CRI CreateContainer, containerd will call AppendContainer to append the container related resources to sandbox first, and then connect to the "TaskAddress" of the sandbox to perform Task API calls.

Do we stil use task service?

So yes we still use task service, it is just that the task service is provided by created sandbox.

Why do we want Controller to handle containers?

The AppendContainer/UpdateContainer/RemoveContainer API is for sandbox updating the resources when add/update/delete a container in it(maybe the API method name can be changed to AppendContainerResource/UpdateContainerResource/RemoveContainerResource?). maybe the Update API is also enough for doing this, but then containerd should calculate the sandbox total resources when add/update/delete a container, and call the Update API of sandbox. and maybe different sandbox implementation has is own logic to update the sandbox resource based on the container spec, so I Add this three method to Controller. AppendContainer is called when create a container, UpdateContainer is called when Execing a process in a container, or updating container resources, RemoveContainer is of course called when deleting a container.

How to manage container's lifecycle?
The container lifecycle management is still through Task API call, as I mentioned above.

mxpv · 2022-10-12T16:48:58Z

pkg/cri/cri.go

-		log.G(ctx).Info("using legacy CRI server")
-		s, err = server.NewCRIService(c, client)
-	}
+	s, err := server.NewCRIService(c, client)


We use sbserver for all ongoing sandbox work on CRI side, to avoid breaking existing things.

Yes, I will change my codes to the sbserver.

the sbserver is back

mxpv · 2022-10-12T16:50:05Z

api/services/containers/v1/containers.proto

+	// that the container to be launched in.
+	//
+	// This field is set to empty when the no sandbox specified.
+	string sandbox_key = 12;


Can you provide an example?

Execuse me I don't get if you mean the example of sandbox_key or the implemetation of our sandboxer?
If you mean the sandbox_key, it is actually a sandbox id created, like the field snapshot_key, which means the snapshot id for the container.

If you mean a implemetation of the sandboxer, Actually I added an embeded "Sandboxer" based on runc. which I name it linux-sandbox, it can be run if you pull my branch in this PR and compile the containerd.

To make this containerd running, we has to prepare a "pause" executable file, move it in /bin folder as it is the default location for runc sandboxer to find the pause executable. we do this to remove the dependency of Snapshotter for Sandboxer, and we don't need a pause image to start a sandbox anymore. then we did some modification in the config file of containerd

[plugins.cri.containerd.runtimes.runc] runtime_type="io.containerd.runc.v2" sandboxer="linux-sandbox"

mxpv · 2022-10-12T17:11:50Z

I think we can introduce the concept of "Sandboxer" like "Snapshotter" to containerd, every the sandbox management should specify a "Sandboxer". we can provide different kinds of sandboxers, such as hypervisor based sandboxer, cgroup/ns based sandboxer like runc, and wasm based sandboxer that may appear in the future.

These have to run as a separate daemons (similarly to remote snapshotter), right?

Right now we have to launch a shim process for every pod, which is an overhead that can bot be ignored.

Shims may return same address for every shim --start call if they want to be a singleton. This way you can get one instance to manage all pods.

This takes a slightly different approach, then our current implementation. I'd like to better understand pros and cons.

abel-von · 2022-10-13T09:49:41Z

These have to run as a separate daemons (similarly to remote snapshotter), right?

Also like the Snapshotter, the Sandboxer(actually the SandboxController) can run embeded in the containerd, or a seperate daemon. I have implemented a linux-sandbox Controller(based on runc) in the containerd in this PR.

Shims may return same address for every shim --start call if they want to be a singleton. This way you can get one instance to manage all pods.

This takes a slightly different approach, then our current implementation. I'd like to better understand pros and cons.

I think the biggest pros is that it clearly seperate the notion of sandbox and container. the sandbox is currently stored as a so called "sandbox container" in containerd, but I think a sandbox is a sandbox, it is not a container, or a Task, we shouldn't create a container or start a task to start a sandbox. the sandbox management can be seperated out from the task and container management with the design in this PR.

Also I think the "shim" is like a compromised design in a lot of senario, such as dockershim or kata-shim. after the API is unified and become stable, the shims may be removed. the new Sandbox API may be the ending of containerd shim. we start a sandbox, and create container in the sandbox by calling Task API provided by the Sandbox, no more management of the shim process in containerd.

mxpv · 2022-10-13T15:55:22Z

I think the biggest pros is that it clearly seperate the notion of sandbox and container.

I believe this is how current sandbox API implementation works. We use sandbox.Store/Controllers to manage sandbox metadata/lifecycle, and we use existing task service to manage both regular and sandboxed containers (which are distinguished by sandbox ID in metadata).

the sandbox is currently stored as a so called "sandbox container" in containerd, but I think a sandbox is a sandbox, it is not a container, or a Task, we shouldn't create a container or start a task to start a sandbox.

Correct, we don't create a container anymore. Pls see sbserver package #7312 (we forked CRI server for sandbox API work).

after the API is unified and become stable, the shims may be removed. the new Sandbox API may be the ending of containerd shim.

Though I didn't chase this use case, it's quite possible to have this with current design as well.

wllenyj · 2022-10-13T17:40:18Z

pkg/cri/store/sandbox/sandbox.go

+
+	SandboxInstance containerd.Sandbox
+
+	Spec *oci.Spec


We don't need to use container to implement sandbox, so do we need Spec? Or is this just an early transitional version?

There should still a struct to describe the sandbox, I thought the Spec is good enough to do this, so I reuse the Spec rather than define a new one.

wllenyj · 2022-10-13T17:40:38Z

api/services/sandbox/v1/sandbox.proto

+	map<string, string> labels  = 5;
+	google.protobuf.Timestamp created_at = 6;
+	google.protobuf.Timestamp updated_at = 7;
+	map<string, google.protobuf.Any> extensions = 8;


Do we need a field about the resource? For example, network namespace, which was previously passed through the pause container spec.

it is all in the spec

wllenyj · 2022-10-13T17:41:17Z

runtime/task_list.go

@@ -0,0 +1,130 @@
+/*


Seems like a previous version here? Didn't you rebase the master branch?

yes, I just reverted them because maybe we don't need the code changes about the shim if the sandbox is implemented in this PR

now the changes of the shims is back

wllenyj · 2022-10-13T17:52:53Z

sandbox/linux/controller.go

+
+func (s *sandboxController) Start(ctx context.Context, sandbox *sandbox.Sandbox) (*sandbox.Sandbox, error) {
+	log.G(ctx).Debugf("start runc sandbox %s with a pause container", sandbox.ID)
+	sandbox.Spec.Process.Args = []string{"/pause"}


The current implementation of runc's Sandboxer is still a task and shim process, right?

there will be a pause container started by runc to provide the sandbox namespaces, but the task server is embeded in the containerd, so there is no shim process for runc pod.

wllenyj · 2022-10-13T17:57:13Z

I think the biggest pros is that it clearly seperate the notion of sandbox and container. the sandbox is currently stored as a so called "sandbox container" in containerd, but I think a sandbox is a sandbox, it is not a container, or a Task, we shouldn't create a container or start a task to start a sandbox. the sandbox management can be seperated out from the task and container management with the design in this PR.

It's cool.

Also I think the "shim" is like a compromised design in a lot of senario, such as dockershim or kata-shim. after the API is unified and become stable, the shims may be removed. the new Sandbox API may be the ending of containerd shim. we start a sandbox, and create container in the sandbox by calling Task API provided by the Sandbox, no more management of the shim process in contained.

The changes are expected to be huge.

abel-von · 2022-10-14T02:33:40Z

The changes are expected to be huge.

Actually we added a sandbox_key in the container, if it is not set, the containerd will follow the same process as before, so I think it is easy to do the compatible things.

runtime/v2/shim/util_unix.go

A sandboxer is a plugin to manage sandbox lifecycle and resouce. Signed-off-by: Fengshaobao <fshb1998@gmail.com>

k8s-ci-robot added the needs-ok-to-test label Oct 10, 2022

abel-von mentioned this pull request Oct 10, 2022

Sandbox API work #7312

Closed

17 tasks

abel-von force-pushed the support-sandbox-new branch from 3a1c2e6 to 07738d3 Compare October 10, 2022 13:46

mxpv reviewed Oct 12, 2022

View reviewed changes

samuelkarp added the status/needs-discussion Needs discussion and decision from maintainers label Oct 12, 2022

wllenyj reviewed Oct 13, 2022

View reviewed changes

abel-von force-pushed the support-sandbox-new branch 2 times, most recently from 74935be to e97aa31 Compare October 19, 2022 06:30

github-advanced-security bot found potential problems Oct 19, 2022

View reviewed changes

runtime/v2/shim/util_unix.go Fixed Show fixed Hide fixed

runtime/v2/shim/util_unix.go Fixed Show fixed Hide fixed

abel-von force-pushed the support-sandbox-new branch 9 times, most recently from 6e7873c to b10a528 Compare October 19, 2022 14:29

Support sandboxer

8b5b764

A sandboxer is a plugin to manage sandbox lifecycle and resouce. Signed-off-by: Fengshaobao <fshb1998@gmail.com>

abel-von force-pushed the support-sandbox-new branch from b10a528 to 8b5b764 Compare October 19, 2022 14:51

abel-von closed this Oct 20, 2022

abel-von reopened this Oct 20, 2022

abel-von mentioned this pull request Nov 30, 2022

[Proposal] Introduce "sandboxer" plugin #7739

Open

kzys mentioned this pull request Dec 23, 2022

Refactor metastore transaction #7528

Closed

abel-von closed this Mar 24, 2023

Introduce the concept of "Sandboxer" #7502

Introduce the concept of "Sandboxer" #7502

Uh oh!

Conversation

abel-von commented Oct 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Oct 10, 2022

Uh oh!

mxpv left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abel-von Oct 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abel-von Oct 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mxpv commented Oct 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abel-von commented Oct 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mxpv commented Oct 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wllenyj commented Oct 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abel-von commented Oct 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

abel-von commented Oct 10, 2022 •

edited

Loading

abel-von Oct 13, 2022 •

edited

Loading

abel-von Oct 13, 2022 •

edited

Loading

mxpv commented Oct 12, 2022 •

edited

Loading

abel-von commented Oct 13, 2022 •

edited

Loading

mxpv commented Oct 13, 2022 •

edited

Loading

wllenyj commented Oct 13, 2022 •

edited

Loading

abel-von commented Oct 14, 2022 •

edited

Loading