Excessive conntrack cleanup causes high memory (12GB) and CPU usage when any Pod with a UDP port changes

### What happened?

We are encountering a severe performance issue in kube-proxy (v1.32) when any Pod with a UDP port is updated (e.g., CoreDNS). In the new kube-proxy implementation, changes to Services or Pods that expose UDP ports trigger a full conntrack cleanup. This cleanup process iterates over the entire conntrack table, leading to extremely high resource consumption—sometimes up to 12 GB of memory and 1.5 CPU cores per kube-proxy instance.

In a simple test, we observed 2,780 instances of the log message "Adding conntrack filter for cleanup", which caused an OOM when kube-proxy was limited to 256 MB of memory. Without that limit, kube-proxy memory usage spiked to 12 GB. On nodes with large conntrack tables, kube-proxy effectively becomes stuck, consuming all available memory each time there is a UDP endpoint change.

This issue appears to be systemic; every change in a Pod with a UDP port triggers all kube-proxy instances to perform the extensive cleanup. Currently, there is no option to disable or throttle this behavior, which disrupts cluster stability and can lead to service degradation or outages. We request that the cleanup logic be revised to target only the relevant conntrack entries or that a mechanism be provided to disable or limit this aggressive cleanup behavior.


https://github.com/kubernetes/kubernetes/pull/127318
https://github.com/kubernetes/kubernetes/issues/126130

### What did you expect to happen?

We expected kube-proxy to handle conntrack cleanup in a more efficient and targeted way. Even if it needs to scan a significant portion of the conntrack table, it should do so without causing a spike to 12 GB of memory usage. Ideally, it would either:

- Limit its cleanup to entries relevant to the specific changed UDP endpoint.
- Provide a way to configure or disable this aggressive cleanup process so it does not risk out-of-memory (OOM) events or excessively high CPU usage.

### How can we reproduce it (as minimally and precisely as possible)?

- Deploy multiple Pods that generate a high volume of DNS requests, for example:
- A simple Golang application making repeated DNS lookups without any caching mechanism.
- Observe kube-proxy resource usage (memory and CPU) on that node.
- Delete or update the coredns Pod (which also uses UDP DNS).
- Watch the logs and resource usage of kube-proxy closely, noting the surge in memory (potentially up to 12 GB) and CPU usage as it performs the conntrack cleanup.

### Anything else we need to know?

![Image](https://github.com/user-attachments/assets/f75f0bc0-e394-45fa-b323-2f6fc0570386)

<img width="1708" alt="Image" src="https://github.com/user-attachments/assets/a4ff03e2-156c-4b60-bf87-7caf53859e51" />

<img width="1679" alt="Image" src="https://github.com/user-attachments/assets/81b4bea7-5cf6-498e-9cd4-ec807f267912" />

### Kubernetes version

<details>

```console
$ kubectl version
Client Version: v1.31.2
Kustomize Version: v5.4.2
Server Version: v1.32.0-eks-5ca49cb
```

</details>


### Cloud provider

<details>
AWS
</details>


### OS version

<details>

```console
# On Linux: Amazon Linux 2
5.10.230-223.885.amzn2.aarch64

```

</details>


### Install tools

<details>
EKS
</details>


### Container runtime (CRI) and version (if applicable)

<details>
containerd://1.7.23
</details>


### Related plugins (CNI, CSI, ...) and versions (if applicable)

<details>
kube-proxy:v1.32.0-minimal-eksbuild.2
</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Excessive conntrack cleanup causes high memory (12GB) and CPU usage when any Pod with a UDP port changes #129982

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Excessive conntrack cleanup causes high memory (12GB) and CPU usage when any Pod with a UDP port changes #129982

Description

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions