Skip to content

cainjector in a zombie state after attempting to shut down #5889

@AcidLeroy

Description

@AcidLeroy

Describe the bug:
cainjector appeared to be in a zombie state and was no longer injecting certificates. A manual restart of the pod resolved the issue.

Related slack discussion

Expected behaviour:
I would expect that the pod should restart automatically.

Steps to reproduce the bug:
Currently, I do not have a good way to reproduce the issue.

Anything else we need to know?:
Here are the last statements printed in the cainjector logs. Kubernetes reported that the pod was in a "Running" state, but the logs suggest that the injector shutdown.

E0320 19:10:38.975779       1 leaderelection.go:330] error retrieving resource lock vmware-system-cert-manager/cert-manager-cainjector-leader-election: Get "https://10.96.0.1:443/apis/coordination.k8s.io/v1/namespaces/vmware-system-cert-
manager/leases/cert-manager-cainjector-leader-election": context deadline exceeded
I0320 19:10:38.975865       1 leaderelection.go:283] failed to renew lease vmware-system-cert-manager/cert-manager-cainjector-leader-election: timed out waiting for the condition
E0320 19:10:38.975918       1 leaderelection.go:306] Failed to release lock: resource name may not be empty
E0320 19:10:38.975960       1 start.go:192] cert-manager/ca-injector "msg"="manager goroutine exited" "error"="error running manager: leader election lost"
I0320 19:10:38.976007       1 shared_informer.go:281] stop requested
E0320 19:10:38.976035       1 source.go:144] cert-manager/controller-runtime/source "msg"="failed to get informer from cache" "error"="Timeout: failed waiting for *v1.Certificate Informer to sync"
I0320 19:10:38.976047       1 recorder.go:103] cert-manager/events "msg"="4235d478ec8206aa92eafee84e7dcda6_ea3dfc3d-cda9-4277-b73b-9bb17ea864cf stopped leading" "object"={"kind":"Lease","apiVersion":"coordination.k8s.io/v1"} "reason"="Le
aderElection" "type"="Normal"
I0320 19:10:38.976090       1 internal.go:567] cert-manager "msg"="Stopping and waiting for non leader election runnables"
I0320 19:10:38.976119       1 internal.go:571] cert-manager "msg"="Stopping and waiting for leader election runnables"
I0320 19:10:38.976120       1 controller.go:247] cert-manager/secret/customresourcedefinition "msg"="Shutdown signal received, waiting for all workers to finish"
I0320 19:10:38.976096       1 controller.go:247] cert-manager/secret/mutatingwebhookconfiguration "msg"="Shutdown signal received, waiting for all workers to finish"
I0320 19:10:38.976130       1 internal.go:577] cert-manager "msg"="Stopping and waiting for caches"
I0320 19:10:38.976149       1 controller.go:227] cert-manager/certificate/customresourcedefinition "msg"="Starting workers" "worker count"=1
I0320 19:10:38.976155       1 shared_informer.go:281] stop requested
I0320 19:10:38.976157       1 controller.go:247] cert-manager/secret/validatingwebhookconfiguration "msg"="Shutdown signal received, waiting for all workers to finish"
I0320 19:10:38.976165       1 trace.go:205] Trace[1232570623]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.25.2/tools/cache/reflector.go:169 (20-Mar-2023 19:10:22.720) (total time: 16255ms):
Trace[1232570623]: [16.255622996s] [16.255622996s] END
E0320 19:10:38.976178       1 source.go:144] cert-manager/controller-runtime/source "msg"="failed to get informer from cache" "error"="Timeout: failed waiting for *v1.Certificate Informer to sync"
I0320 19:10:38.976180       1 shared_informer.go:281] stop requested
I0320 19:10:38.976180       1 shared_informer.go:281] stop requested
I0320 19:10:38.976197       1 shared_informer.go:281] stop requested
I0320 19:10:38.976199       1 shared_informer.go:281] stop requested
I0320 19:10:38.976188       1 shared_informer.go:281] stop requested
I0320 19:10:38.976201       1 shared_informer.go:281] stop requested
I0320 19:10:38.976166       1 controller.go:247] cert-manager/secret/apiservice "msg"="Shutdown signal received, waiting for all workers to finish"
I0320 19:10:38.976226       1 controller.go:227] cert-manager/certificate/validatingwebhookconfiguration "msg"="Starting workers" "worker count"=1
E0320 19:10:38.976229       1 controller.go:210] cert-manager/certificate/apiservice "msg"="Could not wait for Cache to sync" "error"="failed to wait for controller-for-certificate-apiservice caches to sync: cache did not sync"
E0320 19:10:38.976236       1 controller.go:210] cert-manager/certificate/mutatingwebhookconfiguration "msg"="Could not wait for Cache to sync" "error"="failed to wait for controller-for-certificate-mutatingwebhookconfiguration caches to
 sync: failed to get informer from cache: Timeout: failed waiting for *v1.Certificate Informer to sync"
I0320 19:10:38.976174       1 internal.go:581] cert-manager "msg"="Stopping and waiting for webhooks"
I0320 19:10:38.976171       1 shared_informer.go:281] stop requested
I0320 19:10:38.976191       1 shared_informer.go:281] stop requested
I0320 19:10:38.976258       1 internal.go:585] cert-manager "msg"="Wait completed, proceeding to shutdown the manager"
I0320 19:10:38.976263       1 controller.go:249] cert-manager/secret/apiservice "msg"="All workers finished"
I0320 19:10:38.976143       1 controller.go:249] cert-manager/secret/customresourcedefinition "msg"="All workers finished"
I0320 19:10:38.976238       1 controller.go:247] cert-manager/certificate/validatingwebhookconfiguration "msg"="Shutdown signal received, waiting for all workers to finish"
E0320 19:10:38.976271       1 source.go:144] cert-manager/controller-runtime/source "msg"="failed to get informer from cache" "error"="Timeout: failed waiting for *v1.Certificate Informer to sync"
I0320 19:10:38.976198       1 shared_informer.go:281] stop requested
I0320 19:10:38.976292       1 controller.go:247] cert-manager/certificate/customresourcedefinition "msg"="Shutdown signal received, waiting for all workers to finish"
E0320 19:10:38.976297       1 source.go:144] cert-manager/controller-runtime/source "msg"="failed to get informer from cache" "error"="Timeout: failed waiting for *v1.Certificate Informer to sync"
I0320 19:11:03.253581       1 controller.go:178] cert-manager/secret/mutatingwebhookconfiguration/generic-inject-reconciler "msg"="updated object" "resource_kind"="MutatingWebhookConfiguration" "resource_name"="cert-manager-webhook" "res
ource_namespace"="" "resource_version"="v1"
I0320 19:11:03.253647       1 controller.go:249] cert-manager/secret/mutatingwebhookconfiguration "msg"="All workers finished"
I0320 19:11:03.253694       1 controller.go:178] cert-manager/secret/validatingwebhookconfiguration/generic-inject-reconciler "msg"="updated object" "resource_kind"="ValidatingWebhookConfiguration" "resource_name"="cert-manager-webhook"
"resource_namespace"="" "resource_version"="v1"
I0320 19:11:03.253747       1 controller.go:249] cert-manager/secret/validatingwebhookconfiguration "msg"="All workers finished"

Environment details::

  • Kubernetes version: v1.24
  • Cloud-provider/provisioner: N/A
  • cert-manager version: v1.10.2
  • Install method: kustomized manifests from release.

/kind bug

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions