-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Introduce the concept of "Sandboxer" #7502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @abel-von. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
3a1c2e6
to
07738d3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening this!
I have lots of questions :)
rpc Pause (ControllerPauseRequest) returns (google.protobuf.Empty); | ||
rpc Resume (ControllerResumeRequest) returns (google.protobuf.Empty); | ||
rpc Update (ControllerUpdateRequest) returns (ControllerUpdateResponse); | ||
rpc AppendContainer (ControllerAppendContainerRequest) returns (ControllerAppendContainerResponse); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain a bit more here?
In current implementation we use tasks to run sandboxed containers (with sandbox_id to specify which sandbox instance this container belongs to), this way we have task service to control a specific container's lifecycle.
Do we stil use task service?
Why do we want Controller
to handle containers?
How to manage container's lifecycle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sandboxer is much like a Snapshotter, it is for sandbox management, so when call CRI api RunPodSandbox, containerd just call Sandboxer to start a new sandbox and return, it will no longer create a pause container to imitate a sandbox(only in runc senario that we have to create a pause container to provide the sandbox namespaces, but this can be also done in the "runc sandboxer" now, containerd still do not know the existence of a pause container anymore). Every Sandbox exposes its Task API through "TaskAddress", which is either a unix domain socket or a vsock, or event a TCP address maybe. When we call CRI CreateContainer, containerd will call AppendContainer to append the container related resources to sandbox first, and then connect to the "TaskAddress" of the sandbox to perform Task API calls.
Do we stil use task service?
So yes we still use task service, it is just that the task service is provided by created sandbox.
Why do we want
Controller
to handle containers?
The AppendContainer/UpdateContainer/RemoveContainer API is for sandbox updating the resources when add/update/delete a container in it(maybe the API method name can be changed to AppendContainerResource/UpdateContainerResource/RemoveContainerResource?). maybe the Update API is also enough for doing this, but then containerd should calculate the sandbox total resources when add/update/delete a container, and call the Update API of sandbox. and maybe different sandbox implementation has is own logic to update the sandbox resource based on the container spec, so I Add this three method to Controller. AppendContainer is called when create a container, UpdateContainer is called when Execing a process in a container, or updating container resources, RemoveContainer is of course called when deleting a container.
How to manage container's lifecycle?
The container lifecycle management is still through Task API call, as I mentioned above.
pkg/cri/cri.go
Outdated
log.G(ctx).Info("using legacy CRI server") | ||
s, err = server.NewCRIService(c, client) | ||
} | ||
s, err := server.NewCRIService(c, client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use sbserver
for all ongoing sandbox work on CRI side, to avoid breaking existing things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will change my codes to the sbserver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the sbserver is back
// that the container to be launched in. | ||
// | ||
// This field is set to empty when the no sandbox specified. | ||
string sandbox_key = 12; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Execuse me I don't get if you mean the example of sandbox_key or the implemetation of our sandboxer?
If you mean the sandbox_key, it is actually a sandbox id created, like the field snapshot_key, which means the snapshot id for the container.
If you mean a implemetation of the sandboxer, Actually I added an embeded "Sandboxer" based on runc. which I name it linux-sandbox, it can be run if you pull my branch in this PR and compile the containerd.
To make this containerd running, we has to prepare a "pause" executable file, move it in
/bin
folder as it is the default location for runc sandboxer to find the pause executable. we do this to remove the dependency of Snapshotter for Sandboxer, and we don't need a pause image to start a sandbox anymore. then we did some modification in the config file of containerd[plugins.cri.containerd.runtimes.runc] runtime_type="io.containerd.runc.v2" sandboxer="linux-sandbox"
These have to run as a separate daemons (similarly to remote snapshotter), right?
Shims may return same address for every This takes a slightly different approach, then our current implementation. I'd like to better understand pros and cons. |
Also like the Snapshotter, the Sandboxer(actually the SandboxController) can run embeded in the containerd, or a seperate daemon. I have implemented a linux-sandbox Controller(based on runc) in the containerd in this PR.
I think the biggest pros is that it clearly seperate the notion of sandbox and container. the sandbox is currently stored as a so called "sandbox container" in containerd, but I think a sandbox is a sandbox, it is not a container, or a Task, we shouldn't create a container or start a task to start a sandbox. the sandbox management can be seperated out from the task and container management with the design in this PR. Also I think the "shim" is like a compromised design in a lot of senario, such as dockershim or kata-shim. after the API is unified and become stable, the shims may be removed. the new Sandbox API may be the ending of containerd shim. we start a sandbox, and create container in the sandbox by calling Task API provided by the Sandbox, no more management of the shim process in containerd. |
I believe this is how current sandbox API implementation works. We use sandbox.Store/Controllers to manage sandbox metadata/lifecycle, and we use existing task service to manage both regular and sandboxed containers (which are distinguished by sandbox ID in metadata).
Correct, we don't create a container anymore. Pls see
Though I didn't chase this use case, it's quite possible to have this with current design as well. |
|
||
SandboxInstance containerd.Sandbox | ||
|
||
Spec *oci.Spec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to use container to implement sandbox, so do we need Spec? Or is this just an early transitional version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should still a struct to describe the sandbox, I thought the Spec is good enough to do this, so I reuse the Spec rather than define a new one.
map<string, string> labels = 5; | ||
google.protobuf.Timestamp created_at = 6; | ||
google.protobuf.Timestamp updated_at = 7; | ||
map<string, google.protobuf.Any> extensions = 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a field about the resource? For example, network namespace, which was previously passed through the pause container spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is all in the spec
runtime/task_list.go
Outdated
@@ -0,0 +1,130 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a previous version here? Didn't you rebase the master branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I just reverted them because maybe we don't need the code changes about the shim if the sandbox is implemented in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now the changes of the shims is back
|
||
func (s *sandboxController) Start(ctx context.Context, sandbox *sandbox.Sandbox) (*sandbox.Sandbox, error) { | ||
log.G(ctx).Debugf("start runc sandbox %s with a pause container", sandbox.ID) | ||
sandbox.Spec.Process.Args = []string{"/pause"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation of runc's Sandboxer is still a task and shim process, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there will be a pause container started by runc to provide the sandbox namespaces, but the task server is embeded in the containerd, so there is no shim process for runc pod.
It's cool.
The changes are expected to be huge. |
Actually we added a sandbox_key in the container, if it is not set, the containerd will follow the same process as before, so I think it is easy to do the compatible things. |
74935be
to
e97aa31
Compare
6e7873c
to
b10a528
Compare
A sandboxer is a plugin to manage sandbox lifecycle and resouce. Signed-off-by: Fengshaobao <fshb1998@gmail.com>
b10a528
to
8b5b764
Compare
Hi, I think it is really an excellent idea to introduce the sandbox api to containerd by #4131, so I followed every codes changes about this feature. Also I had some ideas about this feature.
I think we can introduce the concept of "Sandboxer" like "Snapshotter" to containerd, every the sandbox management should specify a "Sandboxer". we can provide different kinds of sandboxers, such as hypervisor based sandboxer, cgroup/ns based sandboxer like runc, and wasm based sandboxer that may appear in the future.
I think with this "Sandboxer" introduced, we can completely remove the "shim". Right now we have to launch a shim process for every pod, which is an overhead that can bot be ignored. but with sandboxer, the shim process may be removed.
So I did some modification of the sandbox api codes, and it worked for runc container without shim process(the container status and io for containers can be recovered after containerd restart), I also constructed our own hypervisor based sandboxer and it also worked.
To make this containerd running, we has to prepare a "pause" executable file, move it in
/bin
folder as it is the default location for runc sandboxer to find the pause executable. we do this to remove the dependency of Snapshotter for Sandboxer, and we don't need a pause image to start a sandbox anymore. then we did some modification in the config file of containerdThen we can create sandboxes and containers by
crictl
orctr
. Thesandboxes
subcommand is introduced forctr
, and there is a parameter--sandboxer
for it, default value is the embededlinux-sandbox
You can see that there is no more shim process for containers while the recovery or container status and io is still working (except for tty io).

The details is explained in this doc:
https://docs.google.com/document/d/17_URs9UV3s_7xbaXCLYrCJf-0LhKLOD-NPKCsVdFpGM/edit