CFP: configure kube client-go exponential backoff by default

**Is your proposed feature related to a problem?**

When using CRD mode in large clusters, cilium-agent can sometimes overload apiserver. This can lead to a vicious cycle where cilium-agent LIST requests overload apiserver, which then causes LIST requests to fail, so cilium-agent resends the LIST request, ... and the cluster never recovers.

**Describe the feature you'd like**

Update the Cilium helm chart to configure kube client-go exponential backoff in cilium-agent by default.

**(Optional) Describe your proposed solution**

In AKS, we have been recommending that customers running Cilium at large scale in CRD mode set the following environment variables in the cilium-agent daemonset:

```yaml
- name: KUBE_CLIENT_BACKOFF_BASE
  value: "1"
- name: KUBE_CLIENT_BACKOFF_DURATION
  value: "120"
```

We have used this technique to successfully mitigate many production incidents, and we enable it by default in AKS-managed Cilium.

However, most Cilium users don't know that this option exists, and they need to go out of their way to configure it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CFP: configure kube client-go exponential backoff by default #36525

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CFP: configure kube client-go exponential backoff by default #36525

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions