kube-controller-manager re-allocating already used PodCIDR (v1.31)

### What happened?

Since upgrading to Kubernetes v1.31 we have occasional observed that `kube-controller-manager` allocates a PodCIDR for a new node which is already allocated for an existing node in the cluster. We have about 200 clusters with a combined number of ca. 6k-10k nodes (autoscaling). It has happened about 10 times since upgrading a week ago, so it's rather rare but breaks the networking between the affected nodes when it happens.

We use flannel as overlay network and it uses the podCIDR assigned by `kube-controller-manager`. When a new node gets an existing podCIDR assigned, then the new node doesn't get ready and it gets terminated after 10 min. We check that the node is routable by all other nodes before we mark the node ready. This never happens because the podCIDR already points to another node in the cluster. When the new node is deleted it has the very unfortunate side effect that all flannels detect the node is deleted and they delete the routes for the podCIDR, problem is just that most nodes will have routes for the other node with the same podCIDR and this then stops working.

Terminating the affected node is the best way to resolve the issue but obviously not sustainable.

### What did you expect to happen?

`kube-controller-manager` should _never_ allocate the same podCIDR to multiple nodes at the same time.

### How can we reproduce it (as minimally and precisely as possible)?

We haven't found a predictable way to reproduce it.

We run with the pod CIDR range: `10.2.0.0/15`.

### Anything else we need to know?

This is how we configure `kube-controller-manager`: https://github.com/zalando-incubator/kubernetes-on-aws/blob/dev/cluster/node-pools/master-default/userdata.yaml#L581-L648 (the image is vanilla Kubernetes built from source and hosted in a private registry)

Here is an example from one cluster with logs:

*Node 1 (original)*:
name: `ip-172-31-14-210.eu-central-1.compute.internal`
podCIDR: `10.2.4.0/24`

*Node 2 (new)*:
name: `ip-172-31-15-241.eu-central-1.compute.internal`
podCIDR: `10.2.4.0/24`

`kube-controller-manager`  logs filtered for the node `ip-172-31-14-210.eu-central-1.compute.internal`:

```
I1001 16:54:22.944453       1 actual_state_of_world.go:540] "Failed to update statusUpdateNeeded field in actual state of world" logger="persistentvolume-attach-detach-controller" err="Failed to set statusUpdateNeeded to needed true, because nodeName=\"ip-172-31-14-210.eu-central-1.compute.internal\" does not exist"
I1001 16:54:23.001587       1 range_allocator.go:241] "Successfully synced" logger="node-ipam-controller" key="ip-172-31-14-210.eu-central-1.compute.internal"
I1001 16:54:23.054817       1 node_lifecycle_controller.go:884] "Missing timestamp for Node. Assuming now as a timestamp" logger="node-lifecycle-controller" node="ip-172-31-14-210.eu-central-1.compute.internal"
I1001 16:56:52.839072       1 range_allocator.go:241] "Successfully synced" logger="node-ipam-controller" key="ip-172-31-14-210.eu-central-1.compute.internal"
I1001 17:01:57.639919       1 range_allocator.go:241] "Successfully synced" logger="node-ipam-controller" key="ip-172-31-14-210.eu-central-1.compute.internal"
I1001 17:03:00.584960       1 range_allocator.go:241] "Successfully synced" logger="node-ipam-controller" key="ip-172-31-14-210.eu-central-1.compute.internal"
I1001 17:03:19.873337       1 range_allocator.go:241] "Successfully synced" logger="node-ipam-controller" key="ip-172-31-14-210.eu-central-1.compute.internal"
I1001 17:08:26.273278       1 range_allocator.go:241] "Successfully synced" logger="node-ipam-controller" key="ip-172-31-14-210.eu-central-1.compute.internal"
I1001 17:13:32.137435       1 range_allocator.go:241] "Successfully synced" logger="node-ipam-controller" key="ip-172-31-14-210.eu-central-1.compute.internal"
I1001 17:18:37.436066       1 range_allocator.go:241] "Successfully synced" logger="node-ipam-controller" key="ip-172-31-14-210.eu-central-1.compute.internal"
I1001 17:20:50.207426       1 range_allocator.go:241] "Successfully synced" logger="node-ipam-controller" key="ip-172-31-14-210.eu-central-1.compute.internal"
I1001 17:22:06.023052       1 range_allocator.go:241] "Successfully synced" logger="node-ipam-controller" key="ip-172-31-14-210.eu-central-1.compute.internal"
I1001 17:22:06.058496       1 range_allocator.go:241] "Successfully synced" logger="node-ipam-controller" key="ip-172-31-14-210.eu-central-1.compute.internal"
I1001 17:22:11.270715       1 range_allocator.go:241] "Successfully synced" logger="node-ipam-controller" key="ip-172-31-14-210.eu-central-1.compute.internal"
I1001 17:22:11.376488       1 reconciler.go:275] "attacherDetacher.DetachVolume started" logger="persistentvolume-attach-detach-controller" node="ip-172-31-14-210.eu-central-1.compute.internal" volumeName="kubernetes.io/csi/ebs.csi.aws.com^vol-01a06d931a5cbc177"
I1001 17:22:11.386310       1 range_allocator.go:241] "Successfully synced" logger="node-ipam-controller" key="ip-172-31-14-210.eu-central-1.compute.internal"
I1001 17:22:11.392521       1 operation_generator.go:1539] Verified volume is safe to detach for volume "pvc-0ecc2568-e98d-430d-aa57-d545f7cb8c85" (UniqueName: "kubernetes.io/csi/ebs.csi.aws.com^vol-01a06d931a5cbc177") on node "ip-172-31-14-210.eu-central-1.compute.internal"
I1001 17:22:18.918976       1 operation_generator.go:437] DetachVolume.Detach succeeded for volume "pvc-0ecc2568-e98d-430d-aa57-d545f7cb8c85" (UniqueName: "kubernetes.io/csi/ebs.csi.aws.com^vol-01a06d931a5cbc177") on node "ip-172-31-14-210.eu-central-1.compute.internal"
I1001 17:23:15.637313       1 range_allocator.go:241] "Successfully synced" logger="node-ipam-controller" key="ip-172-31-14-210.eu-central-1.compute.internal"
```

`kube-controller-manager` logs filtered for the node: `ip-172-31-15-241.eu-central-1.compute.internal`:

```
I1001 17:22:45.543797       1 actual_state_of_world.go:540] "Failed to update statusUpdateNeeded field in actual state of world" logger="persistentvolume-attach-detach-controller" err="Failed to set statusUpdateNeeded to needed true, because nodeName=\"ip-172-31-15-241.eu-central-1.compute.internal\" does not exist"
I1001 17:22:45.647668       1 range_allocator.go:422] "Set node PodCIDR" logger="node-ipam-controller" node="ip-172-31-15-241.eu-central-1.compute.internal" podCIDRs=["10.2.4.0/24"]
I1001 17:22:45.648019       1 range_allocator.go:241] "Successfully synced" logger="node-ipam-controller" key="ip-172-31-15-241.eu-central-1.compute.internal"
I1001 17:22:45.648088       1 range_allocator.go:241] "Successfully synced" logger="node-ipam-controller" key="ip-172-31-15-241.eu-central-1.compute.internal"
I1001 17:22:45.695174       1 range_allocator.go:241] "Successfully synced" logger="node-ipam-controller" key="ip-172-31-15-241.eu-central-1.compute.internal"
I1001 17:22:45.702231       1 range_allocator.go:241] "Successfully synced" logger="node-ipam-controller" key="ip-172-31-15-241.eu-central-1.compute.internal"
I1001 17:22:45.724468       1 range_allocator.go:241] "Successfully synced" logger="node-ipam-controller" key="ip-172-31-15-241.eu-central-1.compute.internal"
I1001 17:22:45.752257       1 range_allocator.go:241] "Successfully synced" logger="node-ipam-controller" key="ip-172-31-15-241.eu-central-1.compute.internal"
I1001 17:22:45.820108       1 range_allocator.go:241] "Successfully synced" logger="node-ipam-controller" key="ip-172-31-15-241.eu-central-1.compute.internal"
I1001 17:22:45.846131       1 range_allocator.go:241] "Successfully synced" logger="node-ipam-controller" key="ip-172-31-15-241.eu-central-1.compute.internal"
I1001 17:22:48.640513       1 node_lifecycle_controller.go:884] "Missing timestamp for Node. Assuming now as a timestamp" logger="node-lifecycle-controller" node="ip-172-31-15-241.eu-central-1.compute.internal"
I1001 17:23:02.216555       1 range_allocator.go:241] "Successfully synced" logger="node-ipam-controller" key="ip-172-31-15-241.eu-central-1.compute.internal"
I1001 17:23:02.273556       1 range_allocator.go:241] "Successfully synced" logger="node-ipam-controller" key="ip-172-31-15-241.eu-central-1.compute.internal"
I1001 17:23:16.215529       1 range_allocator.go:241] "Successfully synced" logger="node-ipam-controller" key="ip-172-31-15-241.eu-central-1.compute.internal"
I1001 17:23:21.763576       1 range_allocator.go:241] "Successfully synced" logger="node-ipam-controller" key="ip-172-31-15-241.eu-central-1.compute.internal"
I1001 17:23:46.683733       1 range_allocator.go:241] "Successfully synced" logger="node-ipam-controller" key="ip-172-31-15-241.eu-central-1.compute.internal"
I1001 17:25:05.756696       1 range_allocator.go:241] "Successfully synced" logger="node-ipam-controller" key="ip-172-31-15-241.eu-central-1.compute.internal"
I1001 17:25:05.786234       1 range_allocator.go:241] "Successfully synced" logger="node-ipam-controller" key="ip-172-31-15-241.eu-central-1.compute.internal"
```

The interesting lines are:

```
I1001 16:54:23.001587       1 range_allocator.go:241] "Successfully synced" logger="node-ipam-controller" key="ip-172-31-14-210.eu-central-1.compute.internal"
...
I1001 17:22:45.647668       1 range_allocator.go:422] "Set node PodCIDR" logger="node-ipam-controller" node="ip-172-31-15-241.eu-central-1.compute.internal" podCIDRs=["10.2.4.0/24"]
```

At `16:54` the original node is `Successfully synced` by the `node-ipam-controller` and at `17:22`, much later, does the new node get the same podCIDR allocated as what is already allocated to the original node.

We can also see this from the flannel logs on the new node:

```
 time="2024-10-01T17:22:58Z" level=info msg="Added new node ip-172-31-14-210.eu-central-1.compute.internal with PodCIDR 10.2.4.0/24, address 10.2.4.0"
```

### Kubernetes version

<details>

```console
$ kubectl version
Client Version: v1.31.0
Kustomize Version: v5.4.2
Server Version: v1.31.0
```

</details>


### Cloud provider

<details>
AWS running a custom Kubernetes setup on EC2
</details>


### OS version

<details>

```console
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here
```

</details>


### Install tools


<details>

</details>


### Container runtime (CRI) and version (if applicable)

<details>

</details>


### Related plugins (CNI, CSI, ...) and versions (if applicable)

<details>

</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

kube-controller-manager re-allocating already used PodCIDR (v1.31) #127792

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

kube-controller-manager re-allocating already used PodCIDR (v1.31) #127792

Description

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions