-
Notifications
You must be signed in to change notification settings - Fork 2.3k
refactor: Simplify Dockerfile
#6320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Everything else was to support `setcap` which should not be required to support running non-root.
Please look into the failed ci/circleci kubernetes-test. On the surface, the change appears to have rendered coredns unresponsive to dns queries, but could of course be some flaw or inadequacy with the test. |
I don't have experience with k8s.
I am familiar with the error though. I would say it's due to the non-root user binding below port 1024. This wouldn't normally be allowed on the host with CoreDNS would it? The binary isn't built/packaged with the capability enforced? It is fine for Docker due to their internal bridge networks relaxing the port restriction. My concern is from their comment in k8s by @vinayakankugoyal , where perhaps they relied on the container to bind to It would seem that you can only keep one group happy 😅
So I guess it's your choice as maintainer which one to favour. I don't have much input for k8s as I don't know how flexible that is with DNS configuration. Docker doesn't seem to allow changing the DNS port for containers with |
This needs a DCO sign-off. You can use |
I'm aware, but in light of the k8s test failure I'll wait for maintainer decision on which route the Docker image should cater to. If |
Perhaps we should roll that back. Thoughts? |
@polarathene I'm sorry for 'reviving' this PR after almost a year. Are you still interested in maintaining it and eventually merging it? As you and others have pointed out and as it has been referenced multiple times it's a bit cumbersome having to add the I tried retracing the discussions in #6249, #5969 and here. As I understand it, this change would be accepted and rolling back #5969 could also be done. I'd be happy to see your change merged :) |
There's not much to maintain, as you can see from the diff it simplified it to just a I'm presently at full capacity elsewhere and likely will be for a while. If someone else wants to pursue the change I welcome that, as it's been effectively stale for over a year since I attempted to push this forward. I'll otherwise eventually get back to this some time in 2025.
The capability is always required for the functionality. The The correct approach is for the devs to implement raising the capability when it's actually needed, otherwise failing. It should be roughly a few lines to implement that, I know how to do so with Rust but I'm not a Go dev, shouldn't be hard though. It would then work regardless of in a container environment or not, no AFAIK there isn't much concern about running as root if you drop the capabilities accordingly, the attacker would need some exploit to escape the container and that does not always require a root user within the container to do so. The capability required here isn't one that is needed for Docker IIRC, and I believe k8s also is making it redundant by setting the associated sysctl to
AFAIK the main benefit of running as non-root is the user does not need to think about extra precautions like dropping capabilities you don't need. I don't stay on top of CVEs related to this but to my knowledge the defaults (at least for Docker) are fairly reasonable and compromising the container should be quite difficult unless the user does something unwise, meanwhile those that want to take the additional precautions are likely to do so already rather than trust various upstreams do it correctly. My experience with rootless is minimal, I've done so recently with Podman, but that seems a far better way to approach it for those that are more security conscious. |
Just to clarify, you don't seem to be marked as involved with the CoreDNS project directly? The project maintainers still seem to be interested in the non-root approach they have in place?: |
@polarathene Correct, I'm not directly involved with the CoreDNS project. Our project (gardener/gardener) consumes CoreDNS, and while not critically important, it'd be nice if we weren't forced to grant the Thank you for your quick response and detailed explanation. I'll try contributing a change where the capability is raised dynamically when required 🤞 |
Oh ok. The only reason you wouldn't be forced is when the related systctl makes it irrelevant I think, otherwise you'd need to grant the capability (even for root if it were dropped IIRC). If it helps, this is from an old comment of mine on the topic, demonstrating a non-root user with # Now with `+p`:
$ docker run --rm -it --user 1000 \
--sysctl net.ipv4.ip_unprivileged_port_start=1024 \
--cap-drop ALL --cap-add CAP_NET_BIND_SERVICE \
cap-test-p inspect
Ambient: []
Inheritable: []
Effective: []
Permitted: ["CAP_NET_BIND_SERVICE"]
Bounding: ["CAP_NET_BIND_SERVICE"]
# Without the `--aware` flag this would fail
# since the cap would not be in the effective set:
$ docker run --rm -it --user 1000 \
--sysctl net.ipv4.ip_unprivileged_port_start=1024 \
--cap-drop ALL --cap-add CAP_NET_BIND_SERVICE \
cap-test-p --aware
CAP_NET_BIND_SERVICE is required to bind a privileged port
CAP_NET_BIND_SERVICE is permitted but missing from the effective set
Successfully added CAP_NET_BIND_SERVICE into the Effective set
Successfully bound to: TcpListener { addr: 127.0.0.1:80, fd: 3 } So if the desired port to bind is below The above output is from a small rust program I wrote a while back to support justifying the change to not require it with CoreDNS, but I didn't get around to publishing it 😓 You can also test similar outside of Docker with systemd with ambient capabilities: $ sudo systemd-run --system --uid 1000 \
--unit cap-test-none \
--collect --pty --quiet \
--property AmbientCapabilities=CAP_NET_BIND_SERVICE \
target/x86_64-unknown-linux-musl/release/capability-aware --aware
CAP_NET_BIND_SERVICE is required to bind a privileged port
The effective set already includes CAP_NET_BIND_SERVICE
Successfully bound to: TcpListener { addr: 127.0.0.1:80, fd: 3 }
# The `inspect` subcommand output (bounding set excluded):
Ambient: ["CAP_NET_BIND_SERVICE"]
Inheritable: ["CAP_NET_BIND_SERVICE"]
Permitted: ["CAP_NET_BIND_SERVICE"]
Effective: ["CAP_NET_BIND_SERVICE"] |
1. Why is this pull request needed and what does it do?
TL;DR:
NET_BIND_SERVICE
capability for any user to bind privileged ports isn't required with Docker unless running with--network host
?Dockerfile
and allow those who enforce droppingNET_BIND_SERVICE
to still run thecoredns
binary on unprivileged ports which is more secure of a choice to support.setcap
applying the capability with effective + permitted set is enforcing a requirement for privileges that aren't always necessary. You can still run as non-root without this.Introduced in March 2023 to support non-root feature. It seems largely unnecessary though?
setcap
mandatingNET_BIND_SERVICE
on the/coredns
binary.COPY
forca-certificates.crt
, the image basestatic-debian11:nonroot
already has that (unless theBASE
referenceARG
is changed). I assume that the base image won't be changing much due to the expectation ofUSER nonroot:nonroot
to be successful, thus value of overwriting the contents seems questionable?scratch
image was used for the final stage, it was more compatible then, but was not merged until the switch tononroot
image which added theUSER
directive.ca-certificates.crt
is also from earlier as an improvement for the final stage withscratch
of the image build. Thus no longer important.On a host network, or perhaps a different container run-time you may need to opt for the
setcap
approach? However due to the change in Docker20.10.0
(Dec 2020), non-root users can already bind to ports below 1024:Restore the sysctl restriction and the capability will have an effect when not applied:
The only reason to use the
setcap
call inDockerfile
is to make the/coredns
binary treated as a "capability-dumb binary":root
user can bypass the restriction.setcap
is only in use to bypass the non-root restriction being respected (for any user in the container to leverage, with any port).2. Which issues (if any) are related?
Last issue raised requesting non-root feature:
Reference of unexpected problems from users affected by the non-root approach (which also changed default
WORKDIR
):Earlier attempts to apply the change:
NOTE: PR #4528 at least included a comment providing better context of the capability intent, but it's not been necessary since Docker
20.10.0
(Dec 2020) which for Docker managed networks drops the<=1024
privileged ports.3. Which documentation changes (if any) need to be made?
N/A?
4. Does this introduce a backward incompatible change or deprecation?
1.10.1
Docker image expectation without mandatory capability enforced. The entirebuild
stage is not needed, it's only purpose was to applysetcap
to a pre-builtcoredns
binary.1.10.1
working directory (I could include that, should just beWORKDIR /
).USER nonroot:nonroot
directive shouldn't be necessary as that's already what the image should run with, I left it alone for the visibility.EXPOSE
likewise is not necessary and is effectively documentation.