Skip to content

Conversation

liron-l
Copy link
Contributor

@liron-l liron-l commented Feb 8, 2016

Following #19995 and #17409 this PR enables skipping userns re-mapping
when creating a container. Thus, enabling privileged containers running
side by side with userns remapped containers.

Also, apparently userns integration tests were under the experiment_test
file and had the experimental build flag on. I've moved the code to a
new file and added validation for the new scenario.

Please note I've intentionally added the new option directly to the
vendor folder to ensure everything works correctly. I will do the
changes in engine-api branch once we agree on the best approach for
solving this problem.

@GordonTheTurtle GordonTheTurtle added status/1-design-review dco/no Automatically set by a bot when one of the commits lacks proper signature labels Feb 8, 2016
@jessfraz
Copy link
Contributor

jessfraz commented Feb 8, 2016

I dunno, I think this could confuse people into thinking their privileged containers are being userns-ed as well if they don't look into the documentation or something... but idk

@jessfraz
Copy link
Contributor

jessfraz commented Feb 8, 2016

ooo can you put the test stuff in a serperate PR we should merge that soon :)

@liron-l
Copy link
Contributor Author

liron-l commented Feb 8, 2016

Thanks @jfrazelle, moved the integration test changes to a separate commit #20117.
Do you think the behavior is still unclear even if you user must explicitly specify --skip-userns-remap or --disable-userns?
@estesp what do you think?

@estesp
Copy link
Contributor

estesp commented Feb 8, 2016

I think there are two things to decide based on this and prior discussions with @mrunalp (and even @rhatdan) on the topic:

  • When user namespaces are enabled daemon-wide, is the right answer either (a) --privileged is not available (this is the case today with the current implementation), or (b) should --privileged "turn off" user namespaces for that one container process/instance and also enable the few additional permissions that --privileged turns on?
  • Should there be a separate "disable user ns" feature (as is implemented in this PR) which is separate from the notion of --privileged which only disables the use of user and group mappings for a specific container instance/process?

My biggest issue, even though it may seem superfluous, with this particular (second case) implementation is that yet-another-flag seems clunky. At least with --privileged it is somewhat clear what is happening: e.g. I want a "super-admin" container in the current daemon runtime of (usually) user-namespaced processes to do something more privileged as real root. The drawback is for the use case of just wanting the lack of uid/gid mappings without the extra/added capabilities of --privileged

Are there cases where having a "--privileged turns off uid/gid mapping" feature is not sufficient for the desired result?

@rhatdan
Copy link
Contributor

rhatdan commented Feb 8, 2016

If you don't make --privileged turn off user namespace, then this is going to break a lot of admin tools that are used to manage the system. This prevents us from considering turning on user namespace. We want to ship SPC (Super Privileged Containers) which can manage the host system and have no need of user namespace.

I am not a fan of what we have done with User Namespace to this point, since it does not give me separation between the containers. But without --privileged turning off all security, I believe we have broken the definition of --privileged.

@liron-l
Copy link
Contributor Author

liron-l commented Feb 9, 2016

There are some use-cases where customers need specific 'enhanced' capabilities (e.g., pid=host) and using full-blown --privileged basically violates the principle of least privilege.

Also, in addition to skipping userns remapping, what additional actions must be taken to enable SPC (with respect to current implementation)?

@mrunalp
Copy link
Contributor

mrunalp commented Feb 9, 2016

Another simpler option that is inline with other namespaces is having something like --userns=host. We can allow --privileged when that option is set.

Sent from my iPhone

On Feb 9, 2016, at 3:53 AM, Phil Estes notifications@github.com wrote:

I think there are two things to decide based on this and prior discussions with @mrunalp (and even @rhatdan) on the topic:

When user namespaces are enabled daemon-wide, is the right answer either (a) --privileged is not available (this is the case today with the current implementation), or (b) should --privileged "turn off" user namespaces for that one container process/instance and also enable the few additional permissions that --privileged turns on?
Should there be a separate "disable user ns" feature (as is implemented in this PR) which is separate from the notion of --privileged which only disables the use of user and group mappings for a specific container instance/process?
My biggest issue, even though it may seem superfluous, with this particular (second case) implementation is that yet-another-flag seems clunky. At least with --privileged it is somewhat clear what is happening: e.g. I want a "super-admin" container in the current daemon runtime of (usually) user-namespaced processes to do something more privileged as real root. The drawback is for the use case of just wanting the lack of uid/gid mappings without the extra/added capabilities of --privileged

Are there cases where having a "--privileged turns off uid/gid mapping" feature is not sufficient for the desired result?


Reply to this email directly or view it on GitHub.

@rhatdan
Copy link
Contributor

rhatdan commented Feb 9, 2016

SPC are basically turning off Security Separation and one or more namespaces. --pid=host --net=host --ipc=host --userns=host ...

You also need to volume mount parts of the Host OS into the container. For example

docker run --privileged -v /:/host fedora chroot /host useradd dwalsh

I wrote about it over a year ago.

http://developerblog.redhat.com/2014/11/06/introducing-a-super-privileged-container-concept/

When you move to Container Hosts platforms like Atomic Host and CoreOS, you want to be able ship all software as containers, but some of those containers need to be able to manage the host or manage other containers.

@estesp
Copy link
Contributor

estesp commented Feb 9, 2016

I think @mrunalp's suggestion is the most attractive option I've heard as it keeps it in line with all other flags that turn off namespace separation and provides "commonality" with the host (or another container).

@rhatdan I'm pretty aware you aren't a fan :) but I assume you are well aware that until we have a solid solution for the conflict between layer sharing & file ownership there is no way to use present-day Docker's graph implementation and have separation between containers. I've blogged about and discussed with others many times the "phase 2" ideas, but for now the choices were (a) do nothing at all with userns because we can't do "everything", or (b) do something limited for now until we have the underlying solution required for custom mappings per container. If you can help us drive that into the Linux VFS and/or filesystem community, the faster we can have a reasonable implementation in Docker.

Also, nothing is broken with --privileged today as it is turned off when user namespaces are enabled in the daemon. You get an error if you try to run a privileged container in Docker 1.10 if user namespaces are enabled. This proposal, and a few other issues linked from the first comment are trying to solve that in a clean way, and when we have agreed on the solution, --privileged will work as you have described.

@rhatdan
Copy link
Contributor

rhatdan commented Feb 9, 2016

I would love to see the VFS thing fixed, but while I have tried to raise these issues within Red Hat, I am not hopeful for anything changing soon. I believe the File System kernel engineers are not keen on fixing it, probably for justifiable reasons. I do have some on my team investigating a chown -R solution, which would at least give us Container Separation, with a slow down on at container creation time, and could cause certain backends to explode in size, but might work OK for Devmapper and BTRFS.

I am in agreement with --userns=HOST as a good solution. But blocking --privileged when you run with user namespace with the daemon, makes no sense to me. If that is the case we should really block --pid=host and --ipc=host, since the intention of seeing proceses on the host or sharing IPC would also be blocked by DAC. Which would just generate customer problems. In SELinux world if you share IPC or PID with the host, then SELinux gets disabled, since it would just blow things up. When you do --PID=container:UUID --IPC=container:UUID, we make sure the SELinux labels work. We need to do similar things with USER NAMESPACE.

Only bad thing I see happening if you run with --privileged and usernamespace turned off would be if the admin writes to the COW file system, his content will get UID==0 rather then UID=DOCKERROOT, and this admin would be responsible for cleaning up the mess. But in most cases where he would be running with --privileged he probably wants the app to just work, and is not concerned about Security separation. And probably would not run the container again without --privileged.

Not sure what you mean by saying nothing is broken by --privileged not being allowed if User Namespace is enabled on the host. --privileged is broken. I would argue that one of the first thing that users do when faced with a container being blocked by "Permission Denied" is to run the container with --privileged, but if they turn on UserNamespace, they will not be allowed.

@liron-l
Copy link
Contributor Author

liron-l commented Feb 9, 2016

So do we have an agreement on one of the following behaviors?

  • --userns=HOST - no user namespaces remapping (allow all operations).
  • --userns= or nothing: same as today (block --privileged, --pid=host, --ipc=host, ....).
  • --userns=HOST - no user namespaces remapping (allow all operation).
  • --privileged - no user namespaces remapping (allow all operation).
  • --userns or nothing: block --pid=host, --ipc=host, .... but not --privileged.

Also, what is the expected behavior if the user specifies userns=uid?

@rhatdan
Copy link
Contributor

rhatdan commented Feb 9, 2016

I would prefer 2. And document the fact that content in the COW image will not have the correct UID/GID when you go to run it with User Namespaces in the future.

@estesp
Copy link
Contributor

estesp commented Feb 9, 2016

Last month in chatting with @mrunalp and others on this topic of mismatched file ownership on new files, I was reminded that the only way that CoW is even involved in this discussion is if we are talking about using docker commit and then tagging this image for running by another user who won't run as privileged and/or with user namespaces disabled.

We can and should make a docs comment about that "commit" scenario, but if you run a privileged container, it has zero impact on the image it was run from, and cannot create files that will be seen by any other user of that image. We can also comment about volumes as volumes shared between user namespace-enabled and disabled containers will have this issue for new files created. But volumes are already an advanced topic, especially when mixing with user namespaces, and there is an understanding this has to be a "self-managed" scenario when you care about file ownership and multiple users in volumes.

@liron-l
Copy link
Contributor Author

liron-l commented Feb 9, 2016

@estesp are you ok with proposal 2?

@estesp
Copy link
Contributor

estesp commented Feb 9, 2016

@liron-l my clarification on proposal 2 is that --privileged and --userns=host would, together, resume today's behavior of --privileged alone without user namespaces enabled.

I would still want a user that asks for --privileged without explicitly specifying --userns=host to have the current Docker 1.10 behavior (when the daemon has user namespaces enabled, of course) of getting an error message that --privileged and user namespaces are conflicting options. This way the user has to explicitly ask for --privileged along in combination with disabling a user namespace mapping for that container. Without this, as @jfrazelle mentioned, there may be an expectation that you are running a privileged container with user namespaces still enabled, which would not be the case.

With the known limitations, I don't think a --userns= flag would have any other options than "host" at first. Future use would depend on kernel capabilities or changes. One nice future use would be the specification of a user/uid/group/gid pair which would control the lookups in /etc/sub{u,g}id for handling per-tenant unique mappings when that "phase 2" capability is developed.

@rhatdan
Copy link
Contributor

rhatdan commented Feb 9, 2016

@estesp I am fine with this solution, and would also be fine with --privileged and userns enabled at the same time. Bottom line what I don't like is no way to be able to run --privileged if I have enabled userns in the docker daemon.

@GordonTheTurtle GordonTheTurtle removed the dco/no Automatically set by a bot when one of the commits lacks proper signature label Feb 10, 2016
@liron-l
Copy link
Contributor Author

liron-l commented Feb 10, 2016

Thanks @rhatdan, @estesp, I've changed the PR accordingly.
I've also created a PR in docker-archive-public/docker.engine-api#84 with the userns flag @mrunalp suggested.

@liron-l
Copy link
Contributor Author

liron-l commented Feb 15, 2016

Ping @estesp, will appreciate a review for this change and the accompanied engine-api change in docker-archive-public/docker.engine-api#84. Thanks

@thaJeztah thaJeztah added the status/needs-attention Calls for a collective discussion during a review session label Feb 15, 2016
@thaJeztah
Copy link
Member

Maintainers discussion; we're okay with an option where using --privileged requires you to set --userns=HOST as well (not sure which of the above options that is).

@thaJeztah
Copy link
Member

Possibly add a warning somewhere..

@icecrime icecrime added the status/failing-ci Indicates that the PR in its current state fails the test suite label Feb 29, 2016
@thaJeztah
Copy link
Member

ping @liron-l were you working on this?

@liron-l
Copy link
Contributor Author

liron-l commented Feb 29, 2016

Thanks @thaJeztah, I'm waiting for a review for the new parameter in engine-api.
Except from the changes to the API, the code in this commit is complete (and includes testing).

@thaJeztah
Copy link
Member

@liron-l is it currently implemented as #20111 (comment)? (sorry didn't dive into the code)

@liron-l
Copy link
Contributor Author

liron-l commented Feb 29, 2016

@thaJeztah yes.
If usernamespace mode is enabled, If you specify --userns=HOST everything works normally, while without it --privileged fails (same as today).

@thaJeztah
Copy link
Member

@liron-l spotted one more issue, but LGTM otherwise

@liron-l liron-l force-pushed the 19995_skip_user_ns branch from 92fef60 to e4075c8 Compare March 14, 2016 12:20
@liron-l
Copy link
Contributor Author

liron-l commented Mar 14, 2016

Thanks @thaJeztah, fixed it.

@thaJeztah
Copy link
Member

oh, looks like it needs a rebase now 😢 sorry

Following moby#19995 and moby#17409 this PR enables skipping userns re-mapping
when creating a container (or when executing a command). Thus, enabling
privileged containers running side by side with userns remapped
containers.

The feature is enabled by specifying ```--userns:host```, which will not
remapped the user if userns are applied. If this flag is not specified,
the existing behavior (which blocks specific privileged operation)
remains.

Signed-off-by: Liron Levin <liron@twistlock.com>
@liron-l liron-l force-pushed the 19995_skip_user_ns branch from e4075c8 to 6993e89 Compare March 14, 2016 15:09
@liron-l
Copy link
Contributor Author

liron-l commented Mar 14, 2016

Thanks @thaJeztah, rebased.

@thaJeztah
Copy link
Member

LGTM, thank you!

ping @vdemeester @MHBauer @SvenDowideit ptal

@vdemeester
Copy link
Member

Exciting 😄

Just one question : should we say something about windows ? (like it's ignored under windows)

LGTM 🐮

@calavera
Copy link
Contributor

WindowsTP4 networking issues are unrelated to this change. Merging 🎉

calavera added a commit that referenced this pull request Mar 14, 2016
Run privileged containers when userns are specified - feature proposal
@calavera calavera merged commit d853934 into moby:master Mar 14, 2016
@estesp
Copy link
Contributor

estesp commented Mar 16, 2016

Belated congrats @liron-l -- thanks for getting this through the process!

@liron-l
Copy link
Contributor Author

liron-l commented Mar 16, 2016

Thanks @estesp, @thaJeztah , @jfrazelle , @rhatdan for all the help!

@liron-l liron-l deleted the 19995_skip_user_ns branch March 16, 2016 07:01
@thaJeztah
Copy link
Member

You're welcome! Thanks for contributing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.