Skip to content

Conversation

elezar
Copy link
Contributor

@elezar elezar commented Mar 10, 2023

- What I did

Added support for CDI devices (as DeviceRequests) to the docker daemon. This is gated by the experimental flag.

- How I did it

A CDI device driver is registered on Daemon startup. This includes the initialisation of the CDI registry which is used to perform the required OCI specification modifications. Incoming DeviceRequests which require the cdi capability (or use the cdi driver) are forwarded to a handler that makes the required modifications to the OCI specification through the instantiated registry.

docker/cli#4084 adds support to the Docker CLI for constructing the relevant device requests.

- How to verify it

Prerequisites:

  1. The Daemon includes the changes for this PR
  2. The Daemon is configured in experimental mode
  3. The CLI includes the changes from Support CDI devices in --device flag docker/cli#4084

Create a CDI specification for testing:

cat > vendor.yaml << EOF
cdiVersion: 0.3.0
kind: vendor1.com/device
devices:
- name: test
  containerEdits:
    env:
    - FOO="injected by cdi"
EOF

Copy the generated spec to /var/run/cdi:

sudo mkdir -p /var/run/cdi
sudo cp vendor.yaml /var/run/cdi

Run the docker CLI requesting the device:

docker run --rm -ti --device vendor1.com/device=test  busybox  env
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=e9c8e63f6ce2
TERM=xterm
FOO="injected by cdi"
HOME=/root

Running with an undefined cdi device name shows:

docker run --rm -ti --device vendor1.com/device=invalid  busybox  env
docker: Error response from daemon: CDI device injection failed: unresolvable CDI devices vendor1.com/device=invalid.

If the daemon does not support CDI (e.g. is in experimental mode), the following will be shown:

docker run --rm -ti --device vendor1.com/device=test  busybox  env
docker: Error response from daemon: could not select device driver "cdi" with capabilities: [[cdi]].

- Description for the changelog

Add support for CDI devices under linux. See docker/cli#3864

- A picture of a cute animal (not mandatory but encouraged)

@elezar elezar force-pushed the add-cdi-support branch 3 times, most recently from 6211e0f to d720bfe Compare March 10, 2023 14:08
return fmt.Errorf("CDI registry not initialized")
}

_, err := cdiRegistry.InjectDevices(s, cdiDeviceNames...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this auto-refresh the registry, i.e. to handle CDI Specs added (or removed) after startup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the default mode for the registry is to watch the configured folders (/etc/cdi and /var/run/cdi by default) and update the cached specs on changes. This means that the calls to InjectDevices should always have access to up-to-date specs.

@@ -1203,6 +1203,38 @@ func (s *DockerCLIRunSuite) TestRunAddingOptionalDevicesInvalidMode(c *testing.T
}
}

func (s *DockerCLIRunSuite) TestRunAddingOptionalCdiDevice(c *testing.T) {
Copy link
Contributor

@TBBle TBBle Mar 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's preferred to add an integration/ test, rather than adding an integration-cli test, as the latter is somewhat legacy. I would guess it could go in integration/container and look like one of the existing "run" tests, but with the desired DeviceRequests entry in the client API request.

That also makes the test independent of the cli version, so we can land daemon features before the CLI is updated to call them. I'll be surprised this test passes on CI since the CLI-side of this hasn't been merged yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was wondering about this myself. I will have a look at adding an integration/ test instead. Off the top of your head, do you have one that could be used as a starting point?

Copy link
Contributor

@TBBle TBBle Mar 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a quick poke around, and for test setup, TestBuildSquashParent wouldn't be a bad place to start, since it's also testing an Experimental-flagged, Linux-only feature. I guess in this case, because you need to write the CDI config to the daemon's host's filesystem, then you won't be able to test with a remote daemon either.

I assume you also need to add the "cdi" feature to the config? The only integration test I noticed that manipulates is TestPingBuilderHeader to disable the on-by-default "buildkit" feature, so I guess that's an example for feature config. However, I didn't see the current test enabling this feature, so maybe I'm incorrectly reading the PR as being "default-off"?

For the actual test itself, it doesn't look like any existing integration tests work with hostconfig.DeviceRequests, so perhaps add a WithDeviceRequest helper to integration/internal/container/ops.go, and then you're looking at something quite similar to the tests in integration/container/run_linux_test.go, e.g., TestPrivilegedHostDevices, which also cannot run remotely as it needs to interact with the host daemon's filesystem.

@elezar elezar force-pushed the add-cdi-support branch 5 times, most recently from b28f110 to ebcd55d Compare March 17, 2023 13:12
daemon/cdi.go Outdated
Comment on lines 13 to 16
// initializeCdi initializes the CDI registry and registers the CDI device driver.
func initializeCdi(d *Daemon) error {
if !d.HasExperimental() {
return errdefs.NotImplemented(errors.New("CDI is only supported in experimental mode"))
}

return registerCdiDriver()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer conditional, so it'll always fail startup in non-experimental mode, and on Windows even in experimental mode.

In this flow, the check probably needs to move down into injectDevices before cdi.GetRegistry() (and registerCdiDriver goes back to being a no-op on non-Linux), which probably means the failure goes back to InvalidArgument, like the checkpoint example you linked earlier.

That way the CDI driver always exists (on LInux) but fails if you try to use it in non-experimental mode.

I'm not a Docker maintainer, but in my opinion there's no need to gate this feature, since it only has effect if you actually send a CDI DeviceRequest to the API; so its presence is pretty non-risky, particularly now that it lazy-initialises the registry.

That said, if there's some consideration that the API or exposure might change in the future (after this makes it into a Docker release), experimental still makes sense, to reflect that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, of course. I missed this while trying to simplify things. Need to take another pass at it then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I am thinking is that I will convert registerCdiDriver to a no-op on non-linux platforms. Since no device driver is ever registered this will either trigger an error if a DeviceRequest which should match cdi is ever detected, or never be triggered.

With regards to gating this behind the experimental flag. This was at the request of the maintainers (/cc @corhere) and as such I will leave it in for this interation.

I do agree that with the lazy instantiation of the registry the code is only ever triggered when a user actually reqested a device (on a supported platform) and as such the risks are reduced.

Copy link
Contributor

@corhere corhere Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no need to gate this feature, since it only has effect if you actually send a CDI DeviceRequest to the API

The risk we want to mitigate by gating it behind a feature the experimental flag is the risk that people will start to depend on it before the feature has been stabilized, forcing us to support the ostensibly-experimental API forever.

Copy link
Contributor

@TBBle TBBle Mar 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was kind-of my point: There's not a lot I can envisage changing about this feature (since we're just passing a device string in a known format into an implementation library), and particularly, not a lot I can see that would need to change before removing the experimental flag, i.e. I can't see what the exit-condition here is, except "It's been in the codebase for a while", and unless the Docker release cycle is wildly accelerating, that condition seems (from the outside) pretty close to "Got into a stable release".

Putting this behind experimental (rather than a specific feature flag) actually invokes Hyrum's rule, because anyone who wants to use CDI (which is a deliberate choice at container-run-time anyway, you can't invoke it by accident) has now exposed all users of that daemon instance to also relying on other experimental features they may not realise are experimental. I think experimental makes more sense for behaviour changes, i.e. already-valid things that may do something differently, and are intended to be true in the future. (Edit: This includes things that are stable new features, but may be fully-reverted later. Perhaps CDI fits this? My assumption was it doesn't.)

I also tend to assume that anyone running a non-released build from, e.g., nightlies, is expecting to be exposed to experimental features, but that's probably an over-reach: on reflection, this was a large part of my rationale for suggesting not gating this at all, and that is probably a misalignment on my part.

Anyway, maybe the resolution I'd like to see here is to explicitly document the experiment: What's the condition that will let us remove the experimental gate on this feature, i.e. "When can users start to rely on it"; even if it's something as arbitrary as "Made it into a release and none of the --experimental users reported issues" or similar. And I apologise if there's already a project-wide definition for that which I'm unaware of.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to my typical usage of docker, I have mostly been considering the API from the perspective of the CLI. I think the HostConfig.DeviceRequests that are used are pretty flexible, but I'm not sure whether the API is as clean as what I would like. For example, it seems strange to have to specify a Driver as well as Capabilities. Maybe this can be cleaned up a bit in the refactoring we mentioned, as Ideally I would see a user only specify a Driver and DeviceIDs in this case.

On the topic of removing the experimental gate, were would you like to see the discussion and subsequent decision documentated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have created #45192 to track what is required to remove the --experimental gating here. Once things have settled in this PR, I can add additional tasks and context there.

@elezar
Copy link
Contributor Author

elezar commented Mar 17, 2023

Sorry for the back and forth on this (the numerous rebases). I'm still getting familiar with the development environment.

daemon/daemon.go Outdated
@@ -805,6 +805,11 @@ func NewDaemon(ctx context.Context, config *config.Config, pluginStore *plugin.S
if err := d.setGenericResources(config); err != nil {
return nil, err
}

if err := initializeCdi(d); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewDaemon is not the ideal place to put code which mutates global state and not the daemon struct. Package ./cmd/dockerd would be a better fit for such global concerns. I think (*DaemonCli).start would be the best place to register the CDI device driver since that function is responsible for loading the daemon config, preparing global state and instantiating the daemon object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Will update with that in mind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just been looking at this and one question I have: Registering the device driver currently sets a package-global map in the daemon package. This means that the InitializeCDI function needs to be public. Is this ok, and if so, would we prefer a function on the *Daemon type, or a package-level function. If the latter is used, should we perform the experimental check in the (*DaemonCli).start function instead?

Another option would be to refactor the driver registration in preparation for this and move all the CDI code to the CLI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exporting the InitializeCDI function is fine with me. It should not be associated with *Daemon, for the same reason it does not belong in the NewDaemon constructor: it affects global state, not a particular Daemon instance. Perform the experimental check in (*DaemonCli).start.

Another option would be to refactor the driver registration in preparation for this and move all the CDI code to the CLI.

I would be happy to see (and review) a refactor of the driver registration code, but neither that nor the bulk of the CDI code belongs in package cmd/dockerd. Now, if we're talking refactoring, I would love to see the device-driver registration code and handleDevice function extracted to an object which is instantiated and dependency-injected into the daemon instance, like the pluginStore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're talking 4-year-old refactoring TODOs...

// TODO: nvidia should not be hard-coded, and should be a device plugin instead on the daemon object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the NVIDIA perspective, we see the use of CDI replacing the --gpus flag in the long run. The pointers on doing this better this time around are much appreciated.

I will have a look at the pluginStore and take a pass at the refactoring in a follow-up!

@elezar elezar force-pushed the add-cdi-support branch 2 times, most recently from 31d0641 to b7d4976 Compare March 18, 2023 13:45
)
defer client.ContainerRemove(ctx, id, types.ContainerRemoveOptions{Force: true})

reader, err := client.ContainerLogs(ctx, id, types.ContainerLogsOptions{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@corhere I initially tried to implement something that inspects the container here, but did not see the expected environment variable. This is probably because the container config is being based on the unmodified spec, so if the expectation is that the config shows the new envvar, then we would need to do some kind of reconcilliation after making the modifications. How is this currently handled for other entities (e.g. devices or mounts that are requested?).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had been wondering about that. I noticed that when this was raised before (docker/cli#3864 (comment)) it seemed the preference was to indeed have the spec shown by inspect be updated to match what was actually run, and that the CDI library has some API to support that.

That said, it seems like this could be solved at the moby level by capturing the OCI spec after all the DeviceRequests are processed and returning that in docker inspect; that would also capture the --gpu spec mutations which I assume are equally invisible, as I don't see anything in daemon/nvidia_linux.go apart from the updateSpec hook implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, at present the --gpu spec mutations are also not visible to the inspect command -- although in this case the changes are limited to a single hook as setting two environment variables.

I agree that the goal should be to get these changes visible to daemon, but I would suggest tackling this in a follow-up if possible. Currently, I don't believe that any of the container engines that offer CDI support do this kind of reconciliation -- unliess they use the on-disk config.json file as the source of truth when querying a container.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The container's OCI spec is based on the container config, not the other way around. The container config is the source of truth. No reconciliation is performed for other entities. I really wish we'd thought to record the maintainers call which you joined where we hashed this all out, because I vaguely recall we concluded reporting the CDI device request as-is when inspecting the container, without back-propagating how it patches the OCI spec, would be acceptable as it is sufficient to flag that that the container's OCI spec has been patched by CDI and therefore any issues reported with that particular container can be assumed to be the responsibility of the CDI spec's vendor until proven otherwise. Reaching a final decision on this aspect of CDI integration is one of the gates for the feature to graduate from experimental @TBBle.

Copy link
Contributor

@TBBle TBBle Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to capture the applied CDI spec into the container config or metadata in some way? One of the advantages of CDI is that the specs can be dynamic, e.g. generated just-in-time for that container, and then discarded afterwards, or even updated in-place, I believe.

This could make docker inspect very hard to backtrace for troubleshooting, particularly in the context of, say, cri-dockerd which would (I assume) simply translate the CDI field in CRI into DeviceRequests in the Docker API, but which may be referencing a dynamically-generated CDI spec created just-in-time by a kubernetes Device Plugin (or the newer Dynamic Resource Allocation which is more-suited to just-in-time device creation, e.g., Nvidia software-splittable GPUs), and hence the CDI name is meaningless (probably a GUID or similar) by that point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not aware that dynamic CDI specs were a thing. Given that, some mechanism for reporting the applied CDI spec when inspecting a container will be necessary. What affordances does cri-containerd provide for troubleshooting containers which have dynamic CDI specs applied to them?

Copy link
Contributor

@TBBle TBBle Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, in containerd-cri, CDI is applied to the OCI spec before it's passed to the runtime shim, so the config.json on-disk will have the results of applying the CDI annotations. I'm not sure what debugging options are available at the CRI layer though (e.g., can we dump the historical CRI requests that led to this container's state), and I haven't looked into CDI's support for this, beyond the mention in docker/cli#3864 (comment) that it has APIs to support inspection; so it's possible containerd-cri doesn't currently expose CDI to inspection, just the resulting OCI spec.

Edit: Quick check, and CDI is applied entirely by the containerd-cri plugin, so what the containerd core service gets is with CDI already applied, and so containerd itself is totally agnostic to CDI.

More edit, really going to bed after this: As it happens, right now, critool inspect should show annotations on the container, which I think will include the CDI annotations, but once we move to using the new CDIDevices field in CRI, this back-door goes away. I think critool inspect will show the config.Resources.CDIDevices field, but haven't actually tried this (or the annotations check), this was just from GitHub-based code exploring.

So anyway, I think for containerd-cri we get to see the CDI device name in the CRI API, and the OCI spec that has had the edits applied in containerd core, but the same risk of "cdi spec now isn't the cdi spec used" applies.


For dynamic CDI specs, NVidia already has a non-production-ready Dynamic Resource Allocation setup for Multi-Instance GPUs using CDI, and creating a CDI spec for claims on-the-fly (see CreateClaimSpecFile). In this case, the dynamic CDI spec is kept around until the resource is free'd, so in this case a live container is traceable; it may be that modifying a spec file or deleting it while the device exists is a CDI antipattern, but at this time, I'm not aware of that being explicit. cncf-tags/container-device-interface#15 is the only relevant discussion I'm currently aware of for this sort of use-case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you described for containerd-cri would also hold true verbatim for Moby if you substitute the analogous components:

  • containerd-cri -> dockerd
  • critool inspect -> docker inspect
  • CRI API -> Docker Engine API

You are correct that cri-dockerd is an adapter between the CRI API and Docker Engine API and so would implement CDI support by translating the config.Resources.CDIDevices field into Engine API DeviceRequests. The information it could include when inspecting a container through the CRI API would be limited by the information it could request over the Engine API.

I have no objections to the Moby+CDI troubleshooting experience being on par with cri-containerd+CDI's troubleshooting experience. I won't complain if you want to go above and beyond, either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the discussion here @TBBle and for diving into how things are handled in containerd (and similarly in cri-o). Note that assuming that docker inspect returns a ContainerJSON struct which includes the DeviceRequests in the HostConfig, there will be a reference to the requested CDI device names in the docker inspect output. This would match what is currently available in other implementations since these also do not read the generated OCI specification for their inspect equivalents. This also matches what is currently available for the --gpus flag.

"DeviceRequests": [
                {
                    "Driver": "",
                    "Count": 4,
                    "DeviceIDs": null,
                    "Capabilities": [
                        [
                            "gpu"
                        ]
                    ],
                    "Options": {}
                }
            ],

With regards to:

Would it be possible to capture the applied CDI spec into the container config or metadata in some way?

One thing to investigate would be to "inject" the requested devices into an empty OCI spec and then store this. This should give a reasonable representation of the edits required for a set of CDI devices. Note that the fesibility of this would have to be properly investigated.

Now from a testing perspective, checking the HostConfig doesn't necessarily make sense for this test since this is what we set as input, so I think parsing the logs is fine here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest including a HostConfig inspection check in the test, simply because that's a publicly visible API we'll want to maintain, i.e. dockerd-cri's implementation of critool inspect will depend on this working, and there's no harm and little cost in making that explicitly tested, since unit tests are amongst other things a form of documentation.

@elezar elezar force-pushed the add-cdi-support branch 3 times, most recently from 9400a47 to 78590e1 Compare March 21, 2023 10:29
@elezar elezar marked this pull request as ready for review March 21, 2023 11:00
Copy link
Contributor

@TBBle TBBle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonably simple and fine to me, my only comments here are not functionality-related.

That said, I note that CI hasn't been approved for this PR yet, so I'm assuming that the tests will pass when they are run.

Comment on lines 25 to 23
skip.If(t, !testEnv.DaemonInfo.ExperimentalBuild, "experimental build required for CDI devices")
skip.If(t, testEnv.IsRemoteDaemon, "cannot run daemon when remote daemon")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like these two cases are handled when setting client, since it looks like we can start an experimental-enabled daemon for the test if not using a remote daemon.

If this is not possible due to, e.g., fakecontext, then keep these skips, but remove the code that attempts to start an experimental daemon, i.e. lines 46-51.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I'm not sure what is possible here. I do want this test to only be run with experimental mode (although I could add a negative test for non-experimetnal mode) which is the motivation for the skip.If(t, !testEnv.DaemonInfo.ExperimentalBuild, check.

I am unsure of how this interacts with a remote daemon which makes skipping on a remote daemon simpler here.

Note that I don't think there's a specific requirement that the daemon be local, just that the CDI specification be readable by the daemon. If there is a way to create the spec file for the test so that it works with a remote daemon, I can update it.

I will remove the redundant lines below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I don't think there's a specific requirement that the daemon be local, just that the CDI specification be readable by the daemon. If there is a way to create the spec file for the test so that it works with a remote daemon, I can update it.

There isn't really a way to create the spec file for a remote daemon since, by definition, the environment the remote daemon is running in is outside of the test's control. So there is a specific requirement that the daemon be local.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification. I meant when using CDI outside of the test context. The only requirement would be that the spec be readable by the daemon. That means that if there were a mechanism to create the spec file on a remote host where the daemon was running and have it readable there, we would expect CDI injection to work as expected.

From you comment I would say that it is out of scope for this particular test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Remote daemon" is a concept which is exclusive to the integration test environment.

@elezar elezar force-pushed the add-cdi-support branch 2 times, most recently from 935f471 to fc3f667 Compare March 21, 2023 17:04

package daemon

// registerCDIDriver is a no-op for non-Linux platforms.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some technical limitation behind why CDI is only supported on Linux, or is it more a case of "it doesn't make sense on other platforms"? Given that the meat of the implementation is inside an unconstrained file, I suspect it's the latter. Would you mind adding a code comment to document for posterity the reasons why CDI is only supported on Linux?

I'm wondering if it would make for more readable code to eschew the build constraints and instead check the OS at "runtime", i.e.:

// cmd/dockerd/daemon.go

if runtime.GOOS == "linux" && cli.Config.Experimental {
	daemon.RegisterCDIDriver()
}

The compiler is smart enough to constant-propagate the whole conditional block down to nothing on non-Linux platforms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no fundamental limitations that require linux device for CDI, but the OCI spec injection code implemented in the cdi package is currently linux-specific. With that said, I don't know what the timeline would be for extending this for other platforms (e.g. Windows), but I suppose reworking the code here at that time would make more sense than overcomplicating it now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the OCI spec injection code implemented in the cdi package is currently linux-specific

That's exactly the sort of technical limitation I was looking for. Please document that information so the next person to look at this code knows why the limitation is in place and when it can be lifted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realise cncf-tags/container-device-interface#28 exists, but AFAICT the general CDI injection code is not Linux-specific, just the support for copying attributes from host-device nodes, i.e. the current test, which is env-var-setting-only, should be workable on Windows as far as CDI goes.

That said, the Docker DeviceRequests implementation setup is currently Linux-only, so until that's taken care of (perhaps by the refactoring to use dependency injection as mooted elsewhere in this PR) an integration test won't be implementable.

(I also just noticed a bug in the CDI code I linked, as it will fail even if the device node specification is already complete, which blocks, e.g., LCOW support, although that's not really an issue for Docker until and if we connect up containerd for Windows and the uvm-based LCOWv2)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the general code is not linux-specific, but as you pointed out, the function to retrieve information from the host device is (and is implemented as a no-op on non-linux platforms). One thing that is not clear on first reading is that the cal to specgen.RemoveDevice() and specgen.AddDevice() assume linux devices, so these would also need to replaced with equivalent calls specgen.AddWindowsDevices.

I would also expect that the CDI spec would need some extension to cleanly handle Windows devices.

(I also just noticed a bug in the CDI code I linked, as it will fail even if the device node specification is already complete, which blocks, e.g., LCOW support, although that's not really an issue for Docker until and if we connect up containerd for Windows and the uvm-based LCOWv2)

Could you update cncf-tags/container-device-interface#131 with more contex. (I will try to come up with a quick reproducer later today if I get a chance).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that is not clear on first reading is that the cal to specgen.RemoveDevice() and specgen.AddDevice() assume linux devices, so these would also need to replaced with equivalent calls specgen.AddWindowsDevices.

That's a different problem, cncf-tags/container-device-interface#28 is tracking that.

I'm talking about Windows-hosted builds specifying Linux device nodes, e.g. for LCOW cases, or I guess kata-containers or similar VM-managed systems.

I've added a comment to cncf-tags/container-device-interface#131 to try and clarify, and am happy to continue discussion there if I'm still unclear, since none of the cases where this matters apply to Docker today.

Comment on lines 37 to 39
cdifile := fakecontext.New(
t,
"/var/run/cdi",
fakecontext.WithFile("vendor1-device.yaml", cdiSpec),
)
defer cdifile.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not what fakecontext is intended for, so it's surprising that the file /var/run/cdi/vendor1-device.yaml is created on the host the test is running on. Is there some way to integration-test CDI without mutating the live host state? Could you point a daemon at a test fixture directory or temporary directory when looking for CDI specs? Otherwise, please make it very explicit that this test is tampering with the real live /var/run tree by writing and cleaning up the file using the standard library os package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I can make it more explicit. I have already added a commit that allows for the path where CDI spec files are located to be specified. This does require a daemon config though, so it would be good to get some guidance as to how you would prefer this plumbed through. For now I have added a --cdi-spec-dir arg an set this to ${DEST}/cdi creating the file there instead of in /var/run/cdi.

Comment on lines 25 to 23
skip.If(t, !testEnv.DaemonInfo.ExperimentalBuild, "experimental build required for CDI devices")
skip.If(t, testEnv.IsRemoteDaemon, "cannot run daemon when remote daemon")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I don't think there's a specific requirement that the daemon be local, just that the CDI specification be readable by the daemon. If there is a way to create the spec file for the test so that it works with a remote daemon, I can update it.

There isn't really a way to create the spec file for a remote daemon since, by definition, the environment the remote daemon is running in is outside of the test's control. So there is a specific requirement that the daemon be local.

@elezar
Copy link
Contributor Author

elezar commented Apr 15, 2023

I don't love the amount of added vendor code (especially several new indirects), but it does make the actual Moby bits reasonably small. 🤷 👍

Thanks @tianon. Yes, the dependencies that we pulled in via the container-device-interface packages have also been a point of discussion in our upstream k8s work. We are working on reducing the required dependencies -- or at least ensuring that these are only pulled in when needed. Note that in this case, the use of the cdi.Cache type and subsequent import of github.com/opencontainers/runtime-tools cannot be avoided since this is what is being used to perform the actual OCI spec manipulation. We are working with the upstream maintainer to also improve the dependency situation there.

@@ -61,6 +61,8 @@ func installCommonConfigFlags(conf *config.Config, flags *pflag.FlagSet) error {
flags.StringVar(&conf.HTTPSProxy, "https-proxy", "", "HTTPS proxy URL to use for outgoing traffic")
flags.StringVar(&conf.NoProxy, "no-proxy", "", "Comma-separated list of hosts or IP addresses for which the proxy is skipped")

flags.Var(opts.NewNamedListOptsRef("cdi-spec-dirs", &conf.CDISpecDirs, nil), "cdi-spec-dir", "CDI specification directories to use")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In future we could expose this information in docker info

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I have added an item to #45192 where I'm tracking the minor follow-ups.

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thaJeztah thaJeztah added this to the 25.0.0 milestone May 4, 2023
@cpuguy83
Copy link
Member

cpuguy83 commented May 4, 2023

vendor.sum conflict again.

@elezar elezar force-pushed the add-cdi-support branch from af9b80c to fcb4a41 Compare May 4, 2023 20:30
@elezar
Copy link
Contributor Author

elezar commented May 4, 2023

vendor.sum conflict again.

Rebased and updated in latest. (fcb4a41)

@elezar elezar requested review from tianon and thaJeztah May 9, 2023 08:53
elezar added 4 commits May 16, 2023 17:07
Signed-off-by: Evan Lezar <elezar@nvidia.com>
These changes add basic CDI integration to the docker daemon.

A cdi driver is added to handle cdi device requests. This
is gated by an experimental feature flag and is only supported on linux

This change also adds a CDISpecDirs (cdi-spec-dirs) option to the config.
This allows the default values of `/etc/cdi`, /var/run/cdi` to be overridden
which is useful for testing.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
@elezar elezar force-pushed the add-cdi-support branch from fcb4a41 to c2630c9 Compare May 16, 2023 15:12
@elezar
Copy link
Contributor Author

elezar commented May 16, 2023

@tianon @thaJeztah I have updated the vendoring again. Is there anything else that's required here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/daemon impact/changelog kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny status/2-code-review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants