Skip to content

Conversation

abel-von
Copy link
Contributor

@abel-von abel-von commented Oct 10, 2022

Hi, I think it is really an excellent idea to introduce the sandbox api to containerd by #4131, so I followed every codes changes about this feature. Also I had some ideas about this feature.

I think we can introduce the concept of "Sandboxer" like "Snapshotter" to containerd, every the sandbox management should specify a "Sandboxer". we can provide different kinds of sandboxers, such as hypervisor based sandboxer, cgroup/ns based sandboxer like runc, and wasm based sandboxer that may appear in the future.

I think with this "Sandboxer" introduced, we can completely remove the "shim". Right now we have to launch a shim process for every pod, which is an overhead that can bot be ignored. but with sandboxer, the shim process may be removed.

So I did some modification of the sandbox api codes, and it worked for runc container without shim process(the container status and io for containers can be recovered after containerd restart), I also constructed our own hypervisor based sandboxer and it also worked.

To make this containerd running, we has to prepare a "pause" executable file, move it in /bin folder as it is the default location for runc sandboxer to find the pause executable. we do this to remove the dependency of Snapshotter for Sandboxer, and we don't need a pause image to start a sandbox anymore. then we did some modification in the config file of containerd

[plugins.cri.containerd.runtimes.runc]
  runtime_type="io.containerd.runc.v2"
  sandboxer="linux-sandbox"

Then we can create sandboxes and containers by crictl or ctr. The sandboxes subcommand is introduced for ctr, and there is a parameter --sandboxer for it, default value is the embeded linux-sandbox

You can see that there is no more shim process for containers while the recovery or container status and io is still working (except for tty io).
image

The details is explained in this doc:
https://docs.google.com/document/d/17_URs9UV3s_7xbaXCLYrCJf-0LhKLOD-NPKCsVdFpGM/edit

@k8s-ci-robot
Copy link

Hi @abel-von. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@abel-von abel-von mentioned this pull request Oct 10, 2022
17 tasks
@abel-von abel-von force-pushed the support-sandbox-new branch from 3a1c2e6 to 07738d3 Compare October 10, 2022 13:46
Copy link
Member

@mxpv mxpv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this!
I have lots of questions :)

rpc Pause (ControllerPauseRequest) returns (google.protobuf.Empty);
rpc Resume (ControllerResumeRequest) returns (google.protobuf.Empty);
rpc Update (ControllerUpdateRequest) returns (ControllerUpdateResponse);
rpc AppendContainer (ControllerAppendContainerRequest) returns (ControllerAppendContainerResponse);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain a bit more here?

In current implementation we use tasks to run sandboxed containers (with sandbox_id to specify which sandbox instance this container belongs to), this way we have task service to control a specific container's lifecycle.

Do we stil use task service?
Why do we want Controller to handle containers?
How to manage container's lifecycle?

Copy link
Contributor Author

@abel-von abel-von Oct 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sandboxer is much like a Snapshotter, it is for sandbox management, so when call CRI api RunPodSandbox, containerd just call Sandboxer to start a new sandbox and return, it will no longer create a pause container to imitate a sandbox(only in runc senario that we have to create a pause container to provide the sandbox namespaces, but this can be also done in the "runc sandboxer" now, containerd still do not know the existence of a pause container anymore). Every Sandbox exposes its Task API through "TaskAddress", which is either a unix domain socket or a vsock, or event a TCP address maybe. When we call CRI CreateContainer, containerd will call AppendContainer to append the container related resources to sandbox first, and then connect to the "TaskAddress" of the sandbox to perform Task API calls.

Do we stil use task service?

So yes we still use task service, it is just that the task service is provided by created sandbox.

Why do we want Controller to handle containers?

The AppendContainer/UpdateContainer/RemoveContainer API is for sandbox updating the resources when add/update/delete a container in it(maybe the API method name can be changed to AppendContainerResource/UpdateContainerResource/RemoveContainerResource?). maybe the Update API is also enough for doing this, but then containerd should calculate the sandbox total resources when add/update/delete a container, and call the Update API of sandbox. and maybe different sandbox implementation has is own logic to update the sandbox resource based on the container spec, so I Add this three method to Controller. AppendContainer is called when create a container, UpdateContainer is called when Execing a process in a container, or updating container resources, RemoveContainer is of course called when deleting a container.

How to manage container's lifecycle?
The container lifecycle management is still through Task API call, as I mentioned above.

pkg/cri/cri.go Outdated
log.G(ctx).Info("using legacy CRI server")
s, err = server.NewCRIService(c, client)
}
s, err := server.NewCRIService(c, client)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use sbserver for all ongoing sandbox work on CRI side, to avoid breaking existing things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will change my codes to the sbserver.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the sbserver is back

// that the container to be launched in.
//
// This field is set to empty when the no sandbox specified.
string sandbox_key = 12;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide an example?

Copy link
Contributor Author

@abel-von abel-von Oct 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Execuse me I don't get if you mean the example of sandbox_key or the implemetation of our sandboxer?
If you mean the sandbox_key, it is actually a sandbox id created, like the field snapshot_key, which means the snapshot id for the container.

If you mean a implemetation of the sandboxer, Actually I added an embeded "Sandboxer" based on runc. which I name it linux-sandbox, it can be run if you pull my branch in this PR and compile the containerd.

To make this containerd running, we has to prepare a "pause" executable file, move it in /bin folder as it is the default location for runc sandboxer to find the pause executable. we do this to remove the dependency of Snapshotter for Sandboxer, and we don't need a pause image to start a sandbox anymore. then we did some modification in the config file of containerd

[plugins.cri.containerd.runtimes.runc]
  runtime_type="io.containerd.runc.v2"
  sandboxer="linux-sandbox"

@mxpv
Copy link
Member

mxpv commented Oct 12, 2022

I think we can introduce the concept of "Sandboxer" like "Snapshotter" to containerd, every the sandbox management should specify a "Sandboxer". we can provide different kinds of sandboxers, such as hypervisor based sandboxer, cgroup/ns based sandboxer like runc, and wasm based sandboxer that may appear in the future.

These have to run as a separate daemons (similarly to remote snapshotter), right?

Right now we have to launch a shim process for every pod, which is an overhead that can bot be ignored.

Shims may return same address for every shim --start call if they want to be a singleton. This way you can get one instance to manage all pods.

This takes a slightly different approach, then our current implementation. I'd like to better understand pros and cons.

@samuelkarp samuelkarp added the status/needs-discussion Needs discussion and decision from maintainers label Oct 12, 2022
@abel-von
Copy link
Contributor Author

abel-von commented Oct 13, 2022

These have to run as a separate daemons (similarly to remote snapshotter), right?

Also like the Snapshotter, the Sandboxer(actually the SandboxController) can run embeded in the containerd, or a seperate daemon. I have implemented a linux-sandbox Controller(based on runc) in the containerd in this PR.

Shims may return same address for every shim --start call if they want to be a singleton. This way you can get one instance to manage all pods.

This takes a slightly different approach, then our current implementation. I'd like to better understand pros and cons.

I think the biggest pros is that it clearly seperate the notion of sandbox and container. the sandbox is currently stored as a so called "sandbox container" in containerd, but I think a sandbox is a sandbox, it is not a container, or a Task, we shouldn't create a container or start a task to start a sandbox. the sandbox management can be seperated out from the task and container management with the design in this PR.

Also I think the "shim" is like a compromised design in a lot of senario, such as dockershim or kata-shim. after the API is unified and become stable, the shims may be removed. the new Sandbox API may be the ending of containerd shim. we start a sandbox, and create container in the sandbox by calling Task API provided by the Sandbox, no more management of the shim process in containerd.

@mxpv
Copy link
Member

mxpv commented Oct 13, 2022

I think the biggest pros is that it clearly seperate the notion of sandbox and container.

I believe this is how current sandbox API implementation works. We use sandbox.Store/Controllers to manage sandbox metadata/lifecycle, and we use existing task service to manage both regular and sandboxed containers (which are distinguished by sandbox ID in metadata).

the sandbox is currently stored as a so called "sandbox container" in containerd, but I think a sandbox is a sandbox, it is not a container, or a Task, we shouldn't create a container or start a task to start a sandbox.

Correct, we don't create a container anymore. Pls see sbserver package #7312 (we forked CRI server for sandbox API work).

after the API is unified and become stable, the shims may be removed. the new Sandbox API may be the ending of containerd shim.

Though I didn't chase this use case, it's quite possible to have this with current design as well.


SandboxInstance containerd.Sandbox

Spec *oci.Spec
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to use container to implement sandbox, so do we need Spec? Or is this just an early transitional version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should still a struct to describe the sandbox, I thought the Spec is good enough to do this, so I reuse the Spec rather than define a new one.

map<string, string> labels = 5;
google.protobuf.Timestamp created_at = 6;
google.protobuf.Timestamp updated_at = 7;
map<string, google.protobuf.Any> extensions = 8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a field about the resource? For example, network namespace, which was previously passed through the pause container spec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is all in the spec

@@ -0,0 +1,130 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a previous version here? Didn't you rebase the master branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I just reverted them because maybe we don't need the code changes about the shim if the sandbox is implemented in this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now the changes of the shims is back


func (s *sandboxController) Start(ctx context.Context, sandbox *sandbox.Sandbox) (*sandbox.Sandbox, error) {
log.G(ctx).Debugf("start runc sandbox %s with a pause container", sandbox.ID)
sandbox.Spec.Process.Args = []string{"/pause"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation of runc's Sandboxer is still a task and shim process, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there will be a pause container started by runc to provide the sandbox namespaces, but the task server is embeded in the containerd, so there is no shim process for runc pod.

@wllenyj
Copy link
Contributor

wllenyj commented Oct 13, 2022

I think the biggest pros is that it clearly seperate the notion of sandbox and container. the sandbox is currently stored as a so called "sandbox container" in containerd, but I think a sandbox is a sandbox, it is not a container, or a Task, we shouldn't create a container or start a task to start a sandbox. the sandbox management can be seperated out from the task and container management with the design in this PR.

It's cool.

Also I think the "shim" is like a compromised design in a lot of senario, such as dockershim or kata-shim. after the API is unified and become stable, the shims may be removed. the new Sandbox API may be the ending of containerd shim. we start a sandbox, and create container in the sandbox by calling Task API provided by the Sandbox, no more management of the shim process in contained.

The changes are expected to be huge.

@abel-von
Copy link
Contributor Author

abel-von commented Oct 14, 2022

The changes are expected to be huge.

Actually we added a sandbox_key in the container, if it is not set, the containerd will follow the same process as before, so I think it is easy to do the compatible things.

@abel-von abel-von force-pushed the support-sandbox-new branch 2 times, most recently from 74935be to e97aa31 Compare October 19, 2022 06:30
@abel-von abel-von force-pushed the support-sandbox-new branch 9 times, most recently from 6e7873c to b10a528 Compare October 19, 2022 14:29
A sandboxer is a plugin to manage sandbox lifecycle and resouce.

Signed-off-by: Fengshaobao <fshb1998@gmail.com>
@abel-von abel-von closed this Mar 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-ok-to-test status/needs-discussion Needs discussion and decision from maintainers
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants