Skip to content

CFP: configure kube client-go exponential backoff by default #36525

@wedaly

Description

@wedaly

Is your proposed feature related to a problem?

When using CRD mode in large clusters, cilium-agent can sometimes overload apiserver. This can lead to a vicious cycle where cilium-agent LIST requests overload apiserver, which then causes LIST requests to fail, so cilium-agent resends the LIST request, ... and the cluster never recovers.

Describe the feature you'd like

Update the Cilium helm chart to configure kube client-go exponential backoff in cilium-agent by default.

(Optional) Describe your proposed solution

In AKS, we have been recommending that customers running Cilium at large scale in CRD mode set the following environment variables in the cilium-agent daemonset:

- name: KUBE_CLIENT_BACKOFF_BASE
  value: "1"
- name: KUBE_CLIENT_BACKOFF_DURATION
  value: "120"

We have used this technique to successfully mitigate many production incidents, and we enable it by default in AKS-managed Cilium.

However, most Cilium users don't know that this option exists, and they need to go out of their way to configure it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/k8sImpacts the kubernetes API, or kubernetes -> cilium internals translation layers.kind/cfpCilium Feature Proposalkind/featureThis introduces new functionality.sig/scalabilityImpacts how well Cilium handles a high rate of events or churn.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions