-
-
Notifications
You must be signed in to change notification settings - Fork 16.7k
acme: add TLS ALPN challenge #340136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
acme: add TLS ALPN challenge #340136
Conversation
0b02807
to
a512f43
Compare
I saw the error `nix-build -A nixosTests.acme` output
|
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/prs-ready-for-review/3032/4659 |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/nginx-acme-what-am-i-missing/31636/18 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
I don't know if this is related to my changes here or something else with my setup, but I occasionally see the following systemd error after
The interesting part seems to be This only happens sometimes, feels like a race condition somewhere. When I restart the service manually, no error. 🤷 |
I pushed a solution for the systemd service race condition. As there is apparently no built-in systemd mechanism to pause certain services for the time another one runs, workarounds need to be employed. One way is to execute Another approach (thanks @Atemu) is to set With these changes (56f61e3) I also don't need to touch the pre-existing |
This looks good, but I just had one question..
This is by definition what |
That'd be news to me. AFAIK What we need here is a method to pause a service after it has started until a certain unit has finished. |
Yes, I tested many variations with I'm very perplex why systemd doesn't have a mechanism for pausing a service. This must be a common problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that all makes sense. I'm happy to approve this now, but I would be pretty adamant that a test is written up for this feature + the edge cases it handles (one would be suffice). I'm happy to help.
Yes, I'd also like to see this tested, however I have no idea of NixOS tests. If you can give me starting help with the following, I can give it a try:
|
I've been meaning to do so for a while but finally wrote up: systemd/systemd#34954 |
The maintainer approved and I don't see any obvious issues with the diff. I think we're good to merge. @nobodyinperson could you clean up the git commits into sensible steps and make their messages conform with the commit message conventions? |
These are great questions @nobodyinperson 😄 let me see what I can do to answer them..
Yes, the existing acme tests can be extended to cover this use case. Some basics on NixOS tests: The main docs are in the NixOS manual. It is missing a couple of things though, notably a high level overview and instructions on different execution modes.
This leads nicely into question 2 + 3...
The ACME test script creates a few nodes to get the job done:
This results in a fully simulated ACME world, where we have a trusted certificate authority that can generate signed certs using any challenge mechanism. The With all this in mind, adding a test for TLS ALPN should be relatively straight forward. You need to add a specialisation to the webserver which configures the challenge as appropriate, then add an assertion in the testScript to ensure it works. Other maintainers/readers: Would there be value in putting the contents of this comment on the wiki somewhere? It is a hodgepodge of testing info, but perhaps others may find it useful. |
Thanks for the update! I will review this as soon as I can. I'm really intrigued that you found and fixed a deadlock. No worries on the tests - I am rewriting the whole test suite as part of #374792 |
Well, it was a deadlock possibility that this PR would have introduced itself, so yeah... 😅 But still good that it should be fixed now. |
I think it may be very related to a deadlock (or at least a race condition) with HTTP-01 which stems from an architectural issue in the module. Maybe I'm wrong though, but I'll know once I review. |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/using-changes-from-a-nixpkgs-pr-in-your-flake/60948/1 |
Hey there. We had a big problem with the test suite that just got solved on Monday. This opens up my own time to help you get this in now 🙂 I'll try review it tonight, should have a moment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, but you will need to rebase now (my fault, sorry ;) )
Currently it seems like the strategy is to disable Nginx in order to serve the response via the lego server, with that server taking over port 443. I think this can be avoided in proxy servers like Nginx by recognizing the stream {
map $ssl_preread_alpn_protocols $backend {
~\bacme-tls/1\b acme-alpn-01;
default https-server;
}
# ...
# Just need to `proxy_pass localhost:port` if $backend is acme-alpn-01,
# otherwise serve HTTPS like normal.
} |
Here someone put SSL on a different port, but putting lego on a different port sounds way more elegant. That'll be |
That would be used in combo with what @Skyb0rg007 suggested. LetsEncrypt is only going to try port 443, so you would proxy from the web server to Lego. What I would say is: this I already a problem for HTTP-01. Users decide between having acme run on port 80 or another web server which proxies requests to it. The same should be true for tls-01. We would likely want to update the nginx and Httpd modules to provide the option to use tls-alpn, opening up the possibility for not listening on port 80 at all. Whether you do that in this PR or not is up to you, but that will absolutely need accompanying tests to merge. I still don't mind the conflictingServices option. It seems low enough overhead to have it there and give users the option. We don't have to gate keep how a user sets things up by not adding it. |
f266199
to
c3fe25a
Compare
Would be very cool if nginx wouldn't need to be shut down. But this PR provides a working solution for the problem of having port 80 blocked and not controlling DNS, so I'd say let's first finish this and improve later separetely.
Done. Couldn't test it on my live NixOS 24.11 deployment though because the PR patch doesn't apply properly anymore. But the squashed patch before rebasing works fine, so I guess it should be okay. On that note, there's still no tests. |
That's fine for now, I'll try work on a test suite once this is merged. It is much easier to do now. Would you be interested in contributing the components for the web browser proxying of tls-alpn? |
maybe I'll find the time, if it annoys me too much that I have to shut down nginx for the certificates 😉 no promises yet |
That's fine - I was considering doing it myself, so I didn't want to jump the gun if you were already on it. |
Wow, something happened and introduced a massive merge conflict. 😮💨 |
I managed to rebase onto master - Not manually (no chance), not with several versions of manually applying |
Something isn't right. I pushed ca82750 to this branch but it doesn't appear here. Is it a thing that GitHub disconnects forks somehow? 🤔 |
Hm that's really weird, I've never seen this before. Want to try amending the head of the branch then force push? Might wake GitHub up |
This commit adds the `tlsMode`, `tlsPort` and `conflictingServices` options to `services.acme.[...]` to allow using it even if port 80 is blocked and DNS access is impossible. Integration with nginx's enableACME=true is adjusted accordingly.
ca82750
to
dd7670f
Compare
Amending and force-pushing seems to have done it, good tip! 👍 |
Would you be willing to extend the logic added to nginx into the httpd module also? We tend to maintain feature parity for ACME between the two and it would be nice to have there also. |
@m1cr0man Here you have a commit on top/outside this PR that attempts to do this: nobodyinperson@7c6753c I didn't test it as I don't have experience with httpd and the examples in the manual seem needlessly complicated with apache httpd. I also didn't find any apache/httpd/acme nixos tests to run quickly and gave up as I don't have time for this now. |
https://github.com/acmesh-official/acme.sh/wiki/TLS-ALPN-without-downtime#nginx This guide might help us to implement TLS-ALPN without downtime. |
This commit adds the
tlsMode
,tlsPort
andconflictingServices
options tosecurity.acme.[...]
to allow using it even if port 80 is blocked and DNS access is impossible.Initial work based on this commit by @jeschmidt.
I tested these changes successfully (backported to 24.05 though), config here.
Description of changes
Things done
nix.conf
? (See Nix manual)sandbox = relaxed
sandbox = true
nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)Add a 👍 reaction to pull requests you find important.