Skip to content

Using 1.16.0, the ACME issuer using HTTP-01 solver with Gateway API is broken #7337

@stelucz

Description

@stelucz

📢 This issue has been addressed in cert-manager 1.18.1: https://github.com/cert-manager/cert-manager/releases/tag/v1.18.1

ℹ Read the cert-manager 1.18 release-notes to learn more.

Describe the bug:
Issuing a new certificate fails on error waiting for authorization and unexpected non-ACME API error.
controller logs:

I1005 19:55:21.630885       1 conditions.go:203] Setting lastTransitionTime for Certificate "<DOMAIN>-tls" condition "Ready" to 2024-10-05 19:55:21.630871806 +0000 UTC m=+276.682156984
I1005 19:55:21.631690       1 trigger_controller.go:223] "Certificate must be re-issued" logger="cert-manager.controller" key="gateway/<DOMAIN>-tls" reason="DoesNotExist" message="Issuing certificate as Secret does not exist"
I1005 19:55:21.636969       1 conditions.go:203] Setting lastTransitionTime for Certificate "<DOMAIN>-tls" condition "Issuing" to 2024-10-05 19:55:21.636948435 +0000 UTC m=+276.688233644
I1005 19:55:21.657677       1 controller.go:152] "re-queuing item due to optimistic locking on resource" logger="cert-manager.controller" error="Operation cannot be fulfilled on certificates.cert-manager.io \"<DOMAIN>-tls\": the object has been modified; please apply your changes to the latest version and try again"
I1005 19:55:21.657748       1 trigger_controller.go:223] "Certificate must be re-issued" logger="cert-manager.controller" key="gateway/<DOMAIN>-tls" reason="DoesNotExist" message="Issuing certificate as Secret does not exist"
I1005 19:55:21.657774       1 conditions.go:203] Setting lastTransitionTime for Certificate "<DOMAIN>-tls" condition "Issuing" to 2024-10-05 19:55:21.657766356 +0000 UTC m=+276.709051535
I1005 19:55:22.310137       1 conditions.go:263] Setting lastTransitionTime for CertificateRequest "<DOMAIN>-tls-1" condition "Approved" to 2024-10-05 19:55:22.310126399 +0000 UTC m=+277.361411567
I1005 19:55:22.334507       1 conditions.go:263] Setting lastTransitionTime for CertificateRequest "<DOMAIN>-tls-1" condition "Ready" to 2024-10-05 19:55:22.334488823 +0000 UTC m=+277.385774012
W1005 19:55:23.364319       1 warnings.go:70] metadata.finalizers: "finalizer.acme.cert-manager.io": prefer a domain-qualified finalizer name to avoid accidental conflicts with other finalizer writers
I1005 19:55:23.573713       1 pod.go:71] "creating HTTP01 challenge solver pod" logger="cert-manager.controller.http01.ensurePod" resource_name="<DOMAIN>-tls-1-2127692413-1871217317" resource_namespace="gateway" resource_kind="Challenge" resource_version="v1" dnsName="<DOMAIN>" type="HTTP-01"
I1005 19:55:23.636553       1 httproute.go:67] "getting httpRoutes for challenge" logger="cert-manager.controller.http01.getGatewayHTTPRoute" resource_name="<DOMAIN>-tls-1-2127692413-1871217317" resource_namespace="gateway" resource_kind="Challenge" resource_version="v1" dnsName="<DOMAIN>" type="HTTP-01" name="<DOMAIN>-tls-1-2127692413-1871217317" namespace="gateway"
I1005 19:55:23.636622       1 httproute.go:47] "creating HTTPRoute for challenge" logger="cert-manager.controller.http01.ensureGatewayHTTPRoute" resource_name="<DOMAIN>-tls-1-2127692413-1871217317" resource_namespace="gateway" resource_kind="Challenge" resource_version="v1" dnsName="<DOMAIN>" type="HTTP-01" name="<DOMAIN>-tls-1-2127692413-1871217317" namespace="gateway"
I1005 19:55:23.652780       1 pod.go:59] "found one existing HTTP01 solver pod" logger="cert-manager.controller.http01.selfCheck.http01.ensurePod" resource_name="<DOMAIN>-tls-1-2127692413-1871217317" resource_namespace="gateway" resource_kind="Challenge" resource_version="v1" dnsName="<DOMAIN>" type="HTTP-01" related_resource_name="cm-acme-http-solver-88cmh" related_resource_namespace="gateway" related_resource_kind="" related_resource_version=""
I1005 19:55:23.652854       1 service.go:45] "found one existing HTTP01 solver Service for challenge resource" logger="cert-manager.controller.http01.selfCheck.http01.ensureService" resource_name="<DOMAIN>-tls-1-2127692413-1871217317" resource_namespace="gateway" resource_kind="Challenge" resource_version="v1" dnsName="<DOMAIN>" type="HTTP-01" related_resource_name="cm-acme-http-solver-gfhk7" related_resource_namespace="gateway" related_resource_kind="" related_resource_version=""
I1005 19:55:23.652889       1 httproute.go:67] "getting httpRoutes for challenge" logger="cert-manager.controller.http01.selfCheck.http01.getGatewayHTTPRoute" resource_name="<DOMAIN>-tls-1-2127692413-1871217317" resource_namespace="gateway" resource_kind="Challenge" resource_version="v1" dnsName="<DOMAIN>" type="HTTP-01" name="<DOMAIN>-tls-1-2127692413-1871217317" namespace="gateway"
I1005 19:55:23.652934       1 httproute.go:55] "Found existing HTTPRoute for challenge" logger="cert-manager.controller.http01.selfCheck.http01.ensureGatewayHTTPRoute" resource_name="<DOMAIN>-tls-1-2127692413-1871217317" resource_namespace="gateway" resource_kind="Challenge" resource_version="v1" dnsName="<DOMAIN>" type="HTTP-01" name="<DOMAIN>-tls-1-2127692413-1871217317" namespace="gateway"
E1005 19:55:23.719614       1 sync.go:208] "propagation check failed" err="wrong status code '404', expected '200'" logger="cert-manager.controller" resource_name="<DOMAIN>-tls-1-2127692413-1871217317" resource_namespace="gateway" resource_kind="Challenge" resource_version="v1" dnsName="<DOMAIN>" type="HTTP-01"
I1005 19:55:23.737737       1 pod.go:59] "found one existing HTTP01 solver pod" logger="cert-manager.controller.http01.selfCheck.http01.ensurePod" resource_name="<DOMAIN>-tls-1-2127692413-1871217317" resource_namespace="gateway" resource_kind="Challenge" resource_version="v1" dnsName="<DOMAIN>" type="HTTP-01" related_resource_name="cm-acme-http-solver-88cmh" related_resource_namespace="gateway" related_resource_kind="" related_resource_version=""
I1005 19:55:23.737816       1 service.go:45] "found one existing HTTP01 solver Service for challenge resource" logger="cert-manager.controller.http01.selfCheck.http01.ensureService" resource_name="<DOMAIN>-tls-1-2127692413-1871217317" resource_namespace="gateway" resource_kind="Challenge" resource_version="v1" dnsName="<DOMAIN>" type="HTTP-01" related_resource_name="cm-acme-http-solver-gfhk7" related_resource_namespace="gateway" related_resource_kind="" related_resource_version=""
I1005 19:55:23.737856       1 httproute.go:67] "getting httpRoutes for challenge" logger="cert-manager.controller.http01.selfCheck.http01.getGatewayHTTPRoute" resource_name="<DOMAIN>-tls-1-2127692413-1871217317" resource_namespace="gateway" resource_kind="Challenge" resource_version="v1" dnsName="<DOMAIN>" type="HTTP-01" name="<DOMAIN>-tls-1-2127692413-1871217317" namespace="gateway"
I1005 19:55:23.738070       1 httproute.go:55] "Found existing HTTPRoute for challenge" logger="cert-manager.controller.http01.selfCheck.http01.ensureGatewayHTTPRoute" resource_name="<DOMAIN>-tls-1-2127692413-1871217317" resource_namespace="gateway" resource_kind="Challenge" resource_version="v1" dnsName="<DOMAIN>" type="HTTP-01" name="<DOMAIN>-tls-1-2127692413-1871217317" namespace="gateway"
E1005 19:55:23.760244       1 sync.go:208] "propagation check failed" err="wrong status code '503', expected '200'" logger="cert-manager.controller" resource_name="<DOMAIN>-tls-1-2127692413-1871217317" resource_namespace="gateway" resource_kind="Challenge" resource_version="v1" dnsName="<DOMAIN>" type="HTTP-01"
I1005 19:55:23.773927       1 pod.go:59] "found one existing HTTP01 solver pod" logger="cert-manager.controller.http01.selfCheck.http01.ensurePod" resource_name="<DOMAIN>-tls-1-2127692413-1871217317" resource_namespace="gateway" resource_kind="Challenge" resource_version="v1" dnsName="<DOMAIN>" type="HTTP-01" related_resource_name="cm-acme-http-solver-88cmh" related_resource_namespace="gateway" related_resource_kind="" related_resource_version=""
I1005 19:55:23.774259       1 service.go:45] "found one existing HTTP01 solver Service for challenge resource" logger="cert-manager.controller.http01.selfCheck.http01.ensureService" resource_name="<DOMAIN>-tls-1-2127692413-1871217317" resource_namespace="gateway" resource_kind="Challenge" resource_version="v1" dnsName="<DOMAIN>" type="HTTP-01" related_resource_name="cm-acme-http-solver-gfhk7" related_resource_namespace="gateway" related_resource_kind="" related_resource_version=""
I1005 19:55:23.774420       1 httproute.go:67] "getting httpRoutes for challenge" logger="cert-manager.controller.http01.selfCheck.http01.getGatewayHTTPRoute" resource_name="<DOMAIN>-tls-1-2127692413-1871217317" resource_namespace="gateway" resource_kind="Challenge" resource_version="v1" dnsName="<DOMAIN>" type="HTTP-01" name="<DOMAIN>-tls-1-2127692413-1871217317" namespace="gateway"
I1005 19:55:23.774576       1 httproute.go:55] "Found existing HTTPRoute for challenge" logger="cert-manager.controller.http01.selfCheck.http01.ensureGatewayHTTPRoute" resource_name="<DOMAIN>-tls-1-2127692413-1871217317" resource_namespace="gateway" resource_kind="Challenge" resource_version="v1" dnsName="<DOMAIN>" type="HTTP-01" name="<DOMAIN>-tls-1-2127692413-1871217317" namespace="gateway"
E1005 19:55:23.791886       1 sync.go:208] "propagation check failed" err="wrong status code '503', expected '200'" logger="cert-manager.controller" resource_name="<DOMAIN>-tls-1-2127692413-1871217317" resource_namespace="gateway" resource_kind="Challenge" resource_version="v1" dnsName="<DOMAIN>" type="HTTP-01"
I1005 19:55:33.721279       1 pod.go:59] "found one existing HTTP01 solver pod" logger="cert-manager.controller.http01.selfCheck.http01.ensurePod" resource_name="<DOMAIN>-tls-1-2127692413-1871217317" resource_namespace="gateway" resource_kind="Challenge" resource_version="v1" dnsName="<DOMAIN>" type="HTTP-01" related_resource_name="cm-acme-http-solver-88cmh" related_resource_namespace="gateway" related_resource_kind="" related_resource_version=""
I1005 19:55:33.721950       1 service.go:45] "found one existing HTTP01 solver Service for challenge resource" logger="cert-manager.controller.http01.selfCheck.http01.ensureService" resource_name="<DOMAIN>-tls-1-2127692413-1871217317" resource_namespace="gateway" resource_kind="Challenge" resource_version="v1" dnsName="<DOMAIN>" type="HTTP-01" related_resource_name="cm-acme-http-solver-gfhk7" related_resource_namespace="gateway" related_resource_kind="" related_resource_version=""
I1005 19:55:33.722048       1 httproute.go:67] "getting httpRoutes for challenge" logger="cert-manager.controller.http01.selfCheck.http01.getGatewayHTTPRoute" resource_name="<DOMAIN>-tls-1-2127692413-1871217317" resource_namespace="gateway" resource_kind="Challenge" resource_version="v1" dnsName="<DOMAIN>" type="HTTP-01" name="<DOMAIN>-tls-1-2127692413-1871217317" namespace="gateway"
I1005 19:55:33.722158       1 httproute.go:55] "Found existing HTTPRoute for challenge" logger="cert-manager.controller.http01.selfCheck.http01.ensureGatewayHTTPRoute" resource_name="<DOMAIN>-tls-1-2127692413-1871217317" resource_namespace="gateway" resource_kind="Challenge" resource_version="v1" dnsName="<DOMAIN>" type="HTTP-01" name="<DOMAIN>-tls-1-2127692413-1871217317" namespace="gateway"
E1005 19:56:01.969322       1 sync.go:403] "error waiting for authorization" err="context deadline exceeded" logger="cert-manager.controller.acceptChallenge" resource_name="<DOMAIN>-tls-1-2127692413-1871217317" resource_namespace="gateway" resource_kind="Challenge" resource_version="v1" dnsName="<DOMAIN>" type="HTTP-01"
E1005 19:56:01.969387       1 sync.go:240] "unexpected non-ACME API error" err="context deadline exceeded"
E1005 19:56:01.979482       1 controller.go:157] "re-queuing item due to error processing" err="context deadline exceeded" logger="cert-manager.controller"

cert resources:


apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  creationTimestamp: "2024-10-05T19:55:21Z"
  generation: 1
  name: <DOMAIN>-tls
  namespace: gateway
  ownerReferences:
  - apiVersion: gateway.networking.k8s.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: Gateway
    name: main-gw
    uid: f294ac2e-0067-4041-8182-02a7e6a6a79e
  resourceVersion: "107070782"
  uid: d3a5eb42-25a3-4b9b-ba6b-130d90e96f32
spec:
  dnsNames:
  - <DOMAIN>
  issuerRef:
    group: cert-manager.io
    kind: Issuer
    name: letsencrypt
  secretName: <DOMAIN>-tls
  usages:
  - digital signature
  - key encipherment
status:
  conditions:
  - lastTransitionTime: "2024-10-05T19:55:21Z"
    message: Issuing certificate as Secret does not exist
    observedGeneration: 1
    reason: DoesNotExist
    status: "False"
    type: Ready
  - lastTransitionTime: "2024-10-05T20:11:55Z"
    message: 'The certificate request has failed to complete and will be retried:
      Failed to wait for order resource "<DOMAIN>-tls-1-2127692413" to become ready:
      order is in "invalid" state: '
    observedGeneration: 1
    reason: Failed
    status: "False"
    type: Issuing
  failedIssuanceAttempts: 1
  lastFailureTime: "2024-10-05T20:11:55Z"


  apiVersion: cert-manager.io/v1
kind: CertificateRequest
metadata:
  annotations:
    cert-manager.io/certificate-name: <DOMAIN>-tls
    cert-manager.io/certificate-revision: "1"
    cert-manager.io/private-key-secret-name: <DOMAIN>-tls-59vtw
  creationTimestamp: "2024-10-05T19:55:22Z"
  generation: 1
  name: <DOMAIN>-tls-1
  namespace: gateway
  ownerReferences:
  - apiVersion: cert-manager.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: Certificate
    name: <DOMAIN>-tls
    uid: d3a5eb42-25a3-4b9b-ba6b-130d90e96f32
  resourceVersion: "107070778"
  uid: 7e70e4a5-4421-449a-bf0d-092d33bdcef1
spec:
  extra:
    authentication.kubernetes.io/credential-id:
    - JTI=bb18205d-<>-<>-<>-4efb1dbbe544
    authentication.kubernetes.io/node-name:
    - node1
    authentication.kubernetes.io/node-uid:
    - 6e297928-<>-<>-<>-8e4d5507a8dc
    authentication.kubernetes.io/pod-name:
    - cert-manager-855d849766-n9xhx
    authentication.kubernetes.io/pod-uid:
    - 80bac62a-<>-<>-<>-2a7923de992c
  groups:
  - system:serviceaccounts
  - system:serviceaccounts:cert-manager
  - system:authenticated
  issuerRef:
    group: cert-manager.io
    kind: Issuer
    name: letsencrypt
  request: LS0tL<>LQo=
  uid: 8aa51f82-<>-<>-<>-096272e1bc08
  usages:
  - digital signature
  - key encipherment
  username: system:serviceaccount:cert-manager:cert-manager
status:
  conditions:
  - lastTransitionTime: "2024-10-05T19:55:22Z"
    message: Certificate request has been approved by cert-manager.io
    reason: cert-manager.io
    status: "True"
    type: Approved
  - lastTransitionTime: "2024-10-05T19:55:22Z"
    message: 'Failed to wait for order resource "<DOMAIN>-tls-1-2127692413" to become
      ready: order is in "invalid" state: '
    reason: Failed
    status: "False"
    type: Ready
  failureTime: "2024-10-05T20:11:55Z"

Increasing verbosity level does not help to troubleshoot the problem.

Expected behaviour:
Issue certificate by ACME issuer.

Steps to reproduce the bug:

helm values:

config:
  apiVersion: controller.config.cert-manager.io/v1alpha1
  enableGatewayAPI: true
  kind: ControllerConfiguration
crds:
  enabled: true

Issuer:

apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: letsencrypt
  namespace: gateway
spec:
  acme:
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      name: letsencrypt-issuer-account-key
    solvers:
      - http01:
          gatewayHTTPRoute:
            parentRefs:
              - name: main-gw
                namespace: gateway
                kind: Gateway

Gateway:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: main-gw
  namespace: gateway
  annotations:
    cert-manager.io/issuer: letsencrypt
spec:
  gatewayClassName: cilium
  listeners:
  - name: https
    hostname: <DOMAIN>
    port: 443
    protocol: HTTPS
    allowedRoutes:
      namespaces:
        from: All
    tls:
      mode: Terminate
      certificateRefs:
        - name: <DOMAIN>-tls
  - name: http
    hostname: <DOMAIN>
    protocol: HTTP
    port: 80
    allowedRoutes:
        namespaces:
          from: All

If cert-manager is downgraded to 1.15.3, certificate is issued fined as expected.

Anything else we need to know?:

Environment details::

  • Kubernetes version: v1.30.4
  • Cloud-provider/provisioner: N/A
  • cert-manager version: 1.16.0
  • Install method: e.g. helm/static manifests HELM
  • Cilium 1.16.2
  • GatewayAPI 1.1

/kind bug

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.priority/awaiting-more-evidenceLowest priority. Possibly useful, but not yet enough support to actually get it done.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions