-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Runtime v2 (shim API) #2434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runtime v2 (shim API) #2434
Conversation
client.go
Outdated
@@ -79,8 +79,12 @@ func New(address string, opts ...ClientOpt) (*Client, error) { | |||
return nil, err | |||
} | |||
} | |||
rt := "io.containerd.oci.v1" | |||
if runtime.GOOS != "linxu" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
linux
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good so far, don't have the time to finish goign through it all atm, but I'll try to get another pass at it.
client.go
Outdated
@@ -79,8 +79,12 @@ func New(address string, opts ...ClientOpt) (*Client, error) { | |||
return nil, err | |||
} | |||
} | |||
rt := "io.containerd.oci.v1" | |||
if runtime.GOOS != "linxu" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
linux
cmd/containerd-shim-oci-v1/main.go
Outdated
func main() { | ||
if err := shim.Run(oci.New); err != nil { | ||
logrus.WithError(err).Error("shim run") | ||
fmt.Fprintf(os.Stderr, "containerd-shim-process-v1: %s\n", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
containerd-shim-oci-v1
@@ -871,70 +875,6 @@ func TestContainerKillAll(t *testing.T) { | |||
} | |||
} | |||
|
|||
func TestShimSigkilled(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not handled anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the test and it really wasn't testing much, killing the shim then killing the process. I don't think it was a valid test or it was and changed sometime in the past. Felt safe to remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It checks that the container process (redis) is correctly cleaned up if the shim crash/is sigkilled (e.g. by oom).
It doesn't actually kill the process, it checks that it died.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, my bad, i'll fix it. i missed the kill 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems this test is still removed in the latest commit? or did I miss a move somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so I researched this more and the test is only valid for v1 when container was still running and didn't have to reconnect to a shim again. The reason is that it sets an exit handler that does the "cleanup on dead shim" code. However, this only runs if the shim process that containerd is doing a waidpid on returns and checks the shim pid.
I think its safe to remove this as it doesn't really work, it only works in the tests because the daemon isn't restarted. If we want to keep it, i can do some refactoring but it will take a little bit of work to make this work across daemon reboots with running shims. I can do that in a followup PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense.
I haven't yet got the time to compile and play around with the new API to see how it actually behave.
Having a way to start a "cleanup shim" in such scenario may be a cleaner solution for v2 though (and if that fails, things are left to the user to clean).
if wg != nil { | ||
defer wg.Done() | ||
} | ||
|
||
stats, err := cg.Stat(cgroups.IgnoreNotExist) | ||
ctx := namespaces.WithNamespace(context.Background(), t.Namespace()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No timeout / cancel / Sync ? We could end up with several go routine getting the stats on the same task, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they are sync'd in the caller, hard to have a timeout for this because under load, collection could be slower
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, right. I missed the waitgroup, my bad.
IoGID int | ||
WorkDir string | ||
|
||
id string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, but could we bundle exported variables together?
runtime/v2/bundle.go
Outdated
} | ||
}() | ||
// create base directories | ||
for _, d := range paths { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the separation between the base dir and the last dir in the path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
base should always be created while mkdir should return an exists
error if something is already there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason behind doing them in 2 loops if the remove all happens on error anyway? Also what does it mean if Mkdir
returns an exist on the second part, seems odd that it would both return an exists error and delete the directory.
runtime/v2/oci/service_linux.go
Outdated
|
||
outw, err := fifo.OpenFifo(ctx, stdout, syscall.O_WRONLY, 0) | ||
if err != nil { | ||
return nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cleanup stdin if created?
runtime/v2/oci/service_linux.go
Outdated
} | ||
outr, err := fifo.OpenFifo(ctx, stdout, syscall.O_RDONLY, 0) | ||
if err != nil { | ||
return nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing cleanup of in/outw
@crosbymichael - This is a really good start. I can rebase my windows side changes on this and resubmit |
e3c9050
to
b238640
Compare
Codecov Report
@@ Coverage Diff @@
## master #2434 +/- ##
==========================================
- Coverage 44.76% 44.73% -0.04%
==========================================
Files 92 93 +1
Lines 9483 9490 +7
==========================================
Hits 4245 4245
- Misses 4555 4562 +7
Partials 683 683
Continue to review full report at Codecov.
|
if werr == nil { | ||
err2 = os.RemoveAll(work) | ||
if err2 == nil { | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrapping this could be useful, I don't think the message includes the path
runtime/v2/shim.go
Outdated
} | ||
|
||
func (b *binary) Delete(ctx context.Context) (*runtime.Exit, error) { | ||
cmd, err := shimCommand(ctx, b.runtime, b.containerdAddress, b.bundle, "-debug", "delete") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this command not given the bundle id anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its not needed and comes on the Create request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this exec with the delete
arg intended to do?
runtime/v2/util.go
Outdated
args = append(args, cmdArgs...) | ||
name := getShimBinaryName(runtime) | ||
if !filepath.IsAbs(name) { | ||
_, err = exec.LookPath(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently name
will never be absolute, but is it worth considering a case where name is in the same directory as self
but not in the lookup path? In that case the name might be better found at os.Join(os.Dir(self), name)
. Unless there is a plan here to document setting PATH
on containerd for the runtime binary installation location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think for now, we say that it must be in PATH. i'll remove this abs code
runtime/v2/shim.go
Outdated
// newShim starts and returns a new shim | ||
func newShim(ctx context.Context, bundle *Bundle, runtime, containerdAddress string, events *exchange.Exchange, rt *runtime.TaskList) (_ *shim, err error) { | ||
if err := identifiers.Validate(bundle.ID); err != nil { | ||
return nil, errors.Wrapf(err, "invalid task id") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: either Wrap
or add the ID?
runtime/v2/shim.go
Outdated
}() | ||
select { | ||
case <-time.After(1 * time.Second): | ||
terminate(s.shimPid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we return or log this error?
runtime/v2/shim.go
Outdated
|
||
func (s *shim) Exec(ctx context.Context, id string, opts runtime.ExecOpts) (runtime.Process, error) { | ||
if err := identifiers.Validate(id); err != nil { | ||
return nil, errors.Wrapf(err, "invalid exec id") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include the ID?
} | ||
return runtime.State{}, errdefs.ErrNotFound | ||
} | ||
var status runtime.Status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to check the StatusUnknown
task type or have a runtime.UnknownStatus
?
263acdb
to
7976ef8
Compare
aeb0941
to
578358e
Compare
services/tasks/local.go
Outdated
@@ -547,32 +585,26 @@ func getTasksMetrics(ctx context.Context, filter filters.Filter, tasks []runtime | |||
case "id": | |||
return t.ID(), true | |||
case "namespace": | |||
return t.Info().Namespace, true | |||
// return t.Info().Namespace, true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The formatting here is off
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a default case and explanation for not using now?
8deea2c
to
9a1e7b9
Compare
runtime/v2/binary.go
Outdated
if err != nil { | ||
return nil, errors.Wrapf(err, "%s", out) | ||
} | ||
address := string(out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be trimmed? Also wondering if we should just take the stdout rather than combined output.
You previously expressed that this extra connect could cause some additional time, although mostly negligible. Could we still pass in the socket and if the returned address doesn't match just remove it and create a new connection, or would that end up being even longer?
bb10512
to
20cf7ef
Compare
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
runtime/v2/runc/options/oci.proto
Outdated
option go_package = "github.com/containerd/containerd/runtime/v2/runc/options;options"; | ||
|
||
message Options { | ||
bool no_pivot_root = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this isn't popular, but let's document the intent of these fields.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
// each container and allows reattaching to the IO and receiving the exit status | ||
// for the container processes. | ||
service Task { | ||
rpc State(StateRequest) returns (StateResponse); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to document what each method does.
I don't see anything wildly controversial here. It would be good to document the methods and fields used in GRPC. |
@stevvooe I was going to do documentation in a separate PR. I'll take care of proto comments there as well. I wanted to make a document for shim authors, "Building a shim 101" |
@crosbymichael Will take a look at this PR tomorrow. Sorry for the delay. |
This readme specifies shim api and authoring docs. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Update the description and added a readme for shim authors. Runtime v2Runtime v2 introduces a first class shim API for runtime authors to integrate with containerd. Binary NamingUsers specify the runtime they wish to use when creating a container. > ctr run --runtime io.containerd.runc.v1 When a user specifies a runtime name,
containerd keeps the Shim AuthoringThis section is dedicated to runtime authors wishing to build a shim. CommandsContainer information is provided to a shim in two ways.
|
The api LGTM overall. One thing is that can we include required events? e.g. TaskExit (required), OOMEvents (optional/required?) etc. |
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
@Random-Liu ok, updated the README with events. |
it's merged! 🎉 🙌👏👏 |
This implements the shimv2 API proposed in #2426. The tests are using the new
io.containerd.oci.v1
runtime under the new TaskManger implementing the v2 APIs.Runtime v2
Runtime v2 introduces a first class shim API for runtime authors to integrate with containerd.
The shim API is minimal and scoped to the execution lifecycle of a container.
Binary Naming
Users specify the runtime they wish to use when creating a container.
The runtime can also be changed via a container update.
> ctr run --runtime io.containerd.runc.v1
When a user specifies a runtime name,
io.containerd.runc.v1
, they will specify the name and version of the runtime.This will be trasnlated by containerd into a binary name for the shim.
io.containerd.runc.v1
->containerd-shim-runc-v1
containerd keeps the
containerd-shim-*
prefix so that users canps aux | grep containerd-shim
to see running shims on their system.Shim Authoring
This section is dedicated to runtime authors wishing to build a shim.
It will detail how the API works and different considerations when building shim.
Commands
Container information is provided to a shim in two ways.
The OCI Runtime Bundle and on the
Create
rpc request.start
Each shim MUST implement a
start
subcommand.This command will launch new shims.
The start command MUST accept the following flags:
-namespace
the namespace for the container-id
the id of the container-address
the address of the containerd's main socket-publish-binary
the binary path to publish events back to containerdThe start command, as well as all binary calls to the shim, has the bundle for the container set as the
cwd
.The start command MUST return an address to a shim for containerd to issue API requests for container operations.
The start command can either start a new shim or return an address to an existing shim based on the shim's logic.
delete
Each shim MUST implement a
delete
subcommand.This command allows containerd to delete any container resources created, mounted, and/or run by a shim when containerd can no longer communicate over rpc.
This happens if a shim is SIGKILL'd with a running container.
These resources will need to be cleaned up when containerd looses the connection to a shim.
This is also used when containerd boots and reconnects to shims.
If a bundle is still on disk but containerd cannot connect to a shim, the delete command is invoked.
The delete command MUST accept the following flags:
-namespace
the namespace for the container-id
the id of the container-address
the address of the containerd's main socket-publish-binary
the binary path to publish events back to containerdThe delete command will be executed in the container's bundle as its
cwd
.Host Level Shim Configuration
containerd does not provide any host level configuration for shims via the API.
If a shim needs configuration from the user with host level information across all instances, a shim specific configuration file can be setup.
Container Level Shim Configuration
On the create request, there is a generic
*protobuf.Any
that allows a user to specify container level configuration for the shim.A shim author can create their own protobuf message for configuration and clients can import and provide this information is needed.
I/O
I/O for a container is provided by the client to the shim via fifo on Linux, named pipes on Windows, or log files on disk.
The paths to these files are provided on the
Create
rpc for the initial creation and on theExec
rpc for additional processes.Containers that are to be launched with an interactive terminal will have the
terminal
field set totrue
, data is still copied over the files(fifos,pipes) in the same way as non interactive containers.Root Filesystems
The root filesytems for the containers is provided by on the
Create
rpc.Shims are responsible for managing the lifecycle of the filesystem mount during the lifecycle of a container.
The mount protobuf message is:
Shims are responsible for mounting the filesystem into the
rootfs/
directory of the bundle.Shims are also responsible for unmounting of the filesystem.
During a
delete
binary call, the shim MUST ensure that filesystem is also unmounted.Filesystems are provided by the containerd snapshotters.
Other
Unsupported rpcs
If a shim does not or cannot implement an rpc call, it MUST return a
github.com/containerd/containerd/errdefs.ErrNotImplemented
error.ttrpc
ttrpc is the only currently supported protocol for shims.
It works with standard protobufs and GRPC services as well as generating clients.
The only difference between grpc and ttrpc is the wire protocol.
ttrpc removes the http stack in order to save memory and binary size to keep shims small.
It is recommended to use ttrpc in your shim but grpc support is also in development.
Closes #2426