-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Describe the bug:
cert-manager does not remove entries from the cache fqdnToZone
when the corresponding SOA record expires
This causes a failure of checking for the propagated TXT DNS01 acme challenge record.
This specifically happened when I was using the delegated DNS01 acme challenge.
I had made the mistake of not adding NS records for google cloud dns into my cloudflare domain.
The SOA record from cloudflare had an 1800 second (30 minute) TTL, but cert-manager never removes the cached entry.
So cert-manager kept returning the cached nameservers from cloudflare.
This in turn makes cert-manager always ask the authoritative cloudflare nameservers for the TXT records and those aren't recursive, so the TXT record can never correctly be returned.
(I acknowleged I could workaround the issue by using the recursive dns servers flags)
Only when the cert-manager controller pod was restarted did it correctly retrieve the SOA record for the subdomain and then ask the Google Cloud DNS server for the TXT record.
Expected behaviour:
When cert-manager runs FindZoneByFqdn
function, it needs to invalidate cache entries older than the SOA TTL expiration date.
Steps to reproduce the bug:
Add the -v 4
flag to get debug logging for cert-manager controller
While cert-manager controller is running (without restarting the pod)
Use delegated DNS between two providers such as CloudFlare and Google Cloud DNS, but do not add NS entries in CloudFlare for Google Cloud DNS, only the CNAME record for the TXT challenge.
Monitor logs and verify that cert-manager controller discovers and caches the higher level domain from CloudFlare for the fqdn for the dns acme challenge.
Look up the TTL for the SOA record from the CloudFlare domain.
Add the proper NS records in CloudFlare to point to Google Cloud DNS.
Wait longer than the TTL.
Verify from the logs that it is still using the incorrect cached data.
Restart the controller pod and verify that it now gets the SOA record from the delegated domain in Google Cloud DNS.
Anything else we need to know?:
Environment details::
- Kubernetes version: 1.28
- Cloud-provider/provisioner: GKE, CloudFlare, Google Cloud DNS
- cert-manager version: 1.11
- Install method: static manifests
/kind bug