-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Run privileged containers when userns are specified - feature proposal #20111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I dunno, I think this could confuse people into thinking their privileged containers are being userns-ed as well if they don't look into the documentation or something... but idk |
ooo can you put the test stuff in a serperate PR we should merge that soon :) |
Thanks @jfrazelle, moved the integration test changes to a separate commit #20117. |
I think there are two things to decide based on this and prior discussions with @mrunalp (and even @rhatdan) on the topic:
My biggest issue, even though it may seem superfluous, with this particular (second case) implementation is that yet-another-flag seems clunky. At least with Are there cases where having a " |
If you don't make --privileged turn off user namespace, then this is going to break a lot of admin tools that are used to manage the system. This prevents us from considering turning on user namespace. We want to ship SPC (Super Privileged Containers) which can manage the host system and have no need of user namespace. I am not a fan of what we have done with User Namespace to this point, since it does not give me separation between the containers. But without --privileged turning off all security, I believe we have broken the definition of --privileged. |
There are some use-cases where customers need specific 'enhanced' capabilities (e.g., pid=host) and using full-blown Also, in addition to skipping userns remapping, what additional actions must be taken to enable SPC (with respect to current implementation)? |
Another simpler option that is inline with other namespaces is having something like --userns=host. We can allow --privileged when that option is set. Sent from my iPhone
|
SPC are basically turning off Security Separation and one or more namespaces. --pid=host --net=host --ipc=host --userns=host ... You also need to volume mount parts of the Host OS into the container. For example
I wrote about it over a year ago. http://developerblog.redhat.com/2014/11/06/introducing-a-super-privileged-container-concept/ When you move to Container Hosts platforms like Atomic Host and CoreOS, you want to be able ship all software as containers, but some of those containers need to be able to manage the host or manage other containers. |
I think @mrunalp's suggestion is the most attractive option I've heard as it keeps it in line with all other flags that turn off namespace separation and provides "commonality" with the host (or another container). @rhatdan I'm pretty aware you aren't a fan :) but I assume you are well aware that until we have a solid solution for the conflict between layer sharing & file ownership there is no way to use present-day Docker's graph implementation and have separation between containers. I've blogged about and discussed with others many times the "phase 2" ideas, but for now the choices were (a) do nothing at all with userns because we can't do "everything", or (b) do something limited for now until we have the underlying solution required for custom mappings per container. If you can help us drive that into the Linux VFS and/or filesystem community, the faster we can have a reasonable implementation in Docker. Also, nothing is broken with |
I would love to see the VFS thing fixed, but while I have tried to raise these issues within Red Hat, I am not hopeful for anything changing soon. I believe the File System kernel engineers are not keen on fixing it, probably for justifiable reasons. I do have some on my team investigating a I am in agreement with --userns=HOST as a good solution. But blocking --privileged when you run with user namespace with the daemon, makes no sense to me. If that is the case we should really block --pid=host and --ipc=host, since the intention of seeing proceses on the host or sharing IPC would also be blocked by DAC. Which would just generate customer problems. In SELinux world if you share IPC or PID with the host, then SELinux gets disabled, since it would just blow things up. When you do --PID=container:UUID --IPC=container:UUID, we make sure the SELinux labels work. We need to do similar things with USER NAMESPACE. Only bad thing I see happening if you run with --privileged and usernamespace turned off would be if the admin writes to the COW file system, his content will get UID==0 rather then UID=DOCKERROOT, and this admin would be responsible for cleaning up the mess. But in most cases where he would be running with --privileged he probably wants the app to just work, and is not concerned about Security separation. And probably would not run the container again without --privileged. Not sure what you mean by saying nothing is broken by --privileged not being allowed if User Namespace is enabled on the host. --privileged is broken. I would argue that one of the first thing that users do when faced with a container being blocked by "Permission Denied" is to run the container with --privileged, but if they turn on UserNamespace, they will not be allowed. |
So do we have an agreement on one of the following behaviors?
Also, what is the expected behavior if the user specifies |
I would prefer 2. And document the fact that content in the COW image will not have the correct UID/GID when you go to run it with User Namespaces in the future. |
Last month in chatting with @mrunalp and others on this topic of mismatched file ownership on new files, I was reminded that the only way that CoW is even involved in this discussion is if we are talking about using We can and should make a docs comment about that "commit" scenario, but if you run a privileged container, it has zero impact on the image it was run from, and cannot create files that will be seen by any other user of that image. We can also comment about volumes as volumes shared between user namespace-enabled and disabled containers will have this issue for new files created. But volumes are already an advanced topic, especially when mixing with user namespaces, and there is an understanding this has to be a "self-managed" scenario when you care about file ownership and multiple users in volumes. |
@estesp are you ok with proposal 2? |
@liron-l my clarification on proposal 2 is that I would still want a user that asks for With the known limitations, I don't think a |
@estesp I am fine with this solution, and would also be fine with --privileged and userns enabled at the same time. Bottom line what I don't like is no way to be able to run --privileged if I have enabled userns in the docker daemon. |
a85891f
to
99f9abb
Compare
Thanks @rhatdan, @estesp, I've changed the PR accordingly. |
Ping @estesp, will appreciate a review for this change and the accompanied engine-api change in docker-archive-public/docker.engine-api#84. Thanks |
Maintainers discussion; we're okay with an option where using |
Possibly add a warning somewhere.. |
ping @liron-l were you working on this? |
Thanks @thaJeztah, I'm waiting for a review for the new parameter in engine-api. |
@liron-l is it currently implemented as #20111 (comment)? (sorry didn't dive into the code) |
@thaJeztah yes. |
@liron-l spotted one more issue, but LGTM otherwise |
92fef60
to
e4075c8
Compare
Thanks @thaJeztah, fixed it. |
oh, looks like it needs a rebase now 😢 sorry |
Following moby#19995 and moby#17409 this PR enables skipping userns re-mapping when creating a container (or when executing a command). Thus, enabling privileged containers running side by side with userns remapped containers. The feature is enabled by specifying ```--userns:host```, which will not remapped the user if userns are applied. If this flag is not specified, the existing behavior (which blocks specific privileged operation) remains. Signed-off-by: Liron Levin <liron@twistlock.com>
e4075c8
to
6993e89
Compare
Thanks @thaJeztah, rebased. |
LGTM, thank you! ping @vdemeester @MHBauer @SvenDowideit ptal |
Exciting 😄 Just one question : should we say something about windows ? (like it's ignored under windows) LGTM 🐮 |
WindowsTP4 networking issues are unrelated to this change. Merging 🎉 |
Run privileged containers when userns are specified - feature proposal
Belated congrats @liron-l -- thanks for getting this through the process! |
Thanks @estesp, @thaJeztah , @jfrazelle , @rhatdan for all the help! |
You're welcome! Thanks for contributing! |
Following #19995 and #17409 this PR enables skipping userns re-mapping
when creating a container. Thus, enabling privileged containers running
side by side with userns remapped containers.
Also, apparently userns integration tests were under the experiment_test
file and had the experimental build flag on. I've moved the code to a
new file and added validation for the new scenario.
Please note I've intentionally added the new option directly to the
vendor folder to ensure everything works correctly. I will do the
changes in engine-api branch once we agree on the best approach for
solving this problem.