Dropped UDP packets on pod restart

### What happened?

We are seeing occasional instances where a pod receiving UDP packets is
restarted and the new pod doesn't receive packets. The packets are still
being sent by the sending pods, but are being blackholed.

Specifically, there are two pods sending UDP packets to a ClusterIP service.
The UDP source port is not randomized.  The ClusterIP service is backed by two
pods where only one is Ready at a given time (managed by leader election). When
the receiving pod that is leader is deleted (gracefully), most of the time
traffic flows correctly from the two sending pods to the new leader. However
about every 4/5 restarts, one or more of the sending pods starts blackholing
traffic.

This is evidenced by the following kube-proxy logs from the nodes where the
sending pod is running. The following captures were taken at the same time from
two different nodes. A node that did cleanup conntrack entries:

[GOOD.txt](https://github.com/user-attachments/files/16145882/GOOD.txt)

And a node that did not clean up conntrack entries:

[BAD.txt](https://github.com/user-attachments/files/16145901/BAD.txt)

On one of these occurances we were able to validate that conntrack entries were
left and cleaning them up manually allowed traffic to flow:
```
conntrack -D -p udp --dport 6858 -r 172.18.209.67
udp      17 29 src=172.18.203.201 dst=10.32.216.47 sport=6852 dport=6858 [UNREPLIED] src=172.18.209.67 dst=172.18.203.201 sport=6858 dport=6852 mark=0 use=1
conntrack v1.4.7 (conntrack-tools): 1 flow entries have been deleted.
```

In addition to pod restarts, we have seen other cases leading to blackholing of
UDP traffic but don't have clear reproducers for these scenarios yet.

### What did you expect to happen?

We expect that when a pod is deleted, its traffic is moved to the new pod. This happens most of the time.


### How can we reproduce it (as minimally and precisely as possible)?

Deploy sender and receiver manifests from
https://github.com/muff1nman/leader-election-udp. Make one sender pod ready
such that the deployment rolls out:

```
kubectl exec -it receiver-8696979cd6-sj2bp -- bash -c "echo yes > /var/run/udplisten/ready"
```

Then you should have the following:
```
$ kubectl get pod
NAME                        READY   STATUS    RESTARTS   AGE
receiver-8696979cd6-klpgj   10/10   Running   0          3m3s
receiver-8696979cd6-t88dp   0/10    Running   0          48s
sender-744d8d5dd-qh5j2      1/1     Running   0          93s
```

Verify the logs of the sender are as expected:

```
$ kubectl logs -f deploy/sender
sending hello : 2024-07-09T15:16:52+00:00
ack from receiver-8696979cd6-2kkzj-1100
sending hello : 2024-07-09T15:16:53+00:00
ack from receiver-8696979cd6-2kkzj-1101
sending hello : 2024-07-09T15:16:54+00:00
ack from receiver-8696979cd6-2kkzj-1102
sending hello : 2024-07-09T15:16:55+00:00
```

This is the expected good state.

Now run `test.sh` to attempt getting traffic from the sender blackholed. There
are still unknowns as to the exact requirements for timing, so it might take a
couple tries. In my environment it happens about a third of the time. Between
attempts make sure pods are fully running (no ContainerCreating). A successful
trigger then looks like this:

```
ack from receiver-8696979cd6-lc8sg-1100
sending hello : 2024-07-09T15:15:17+00:00
ack from receiver-8696979cd6-lc8sg-1101
sending hello : 2024-07-09T15:15:18+00:00
ack from receiver-8696979cd6-lc8sg-1102
sending hello : 2024-07-09T15:15:19+00:00
ack from receiver-8696979cd6-lc8sg-1103
sending hello : 2024-07-09T15:15:20+00:00
sending hello : 2024-07-09T15:15:21+00:00
sending hello : 2024-07-09T15:15:22+00:00
sending hello : 2024-07-09T15:15:23+00:00
sending hello : 2024-07-09T15:15:24+00:00
sending hello : 2024-07-09T15:15:25+00:00
sending hello : 2024-07-09T15:15:26+00:00
sending hello : 2024-07-09T15:15:27+00:00
sending hello : 2024-07-09T15:15:28+00:00
sending hello : 2024-07-09T15:15:29+00:00
sending hello : 2024-07-09T15:15:30+00:00
sending hello : 2024-07-09T15:15:31+00:00
sending hello : 2024-07-09T15:15:32+00:00
sending hello : 2024-07-09T15:15:33+00:00
sending hello : 2024-07-09T15:15:34+00:00
sending hello : 2024-07-09T15:15:35+00:00
sending hello : 2024-07-09T15:15:36+00:00
sending hello : 2024-07-09T15:15:37+00:00
sending hello : 2024-07-09T15:15:38+00:00
sending hello : 2024-07-09T15:15:39+00:00
sending hello : 2024-07-09T15:15:40+00:00
sending hello : 2024-07-09T15:15:41+00:00
sending hello : 2024-07-09T15:15:42+00:00
sending hello : 2024-07-09T15:15:43+00:00
sending hello : 2024-07-09T15:15:44+00:00
sending hello : 2024-07-09T15:15:45+00:00
sending hello : 2024-07-09T15:15:46+00:00
sending hello : 2024-07-09T15:15:47+00:00
sending hello : 2024-07-09T15:15:48+00:00
sending hello : 2024-07-09T15:15:49+00:00
sending hello : 2024-07-09T15:15:50+00:00
sending hello : 2024-07-09T15:15:51+00:00
sending hello : 2024-07-09T15:15:53+00:00
sending hello : 2024-07-09T15:15:55+00:00
sending hello : 2024-07-09T15:15:57+00:00
sending hello : 2024-07-09T15:15:59+00:00
sending hello : 2024-07-09T15:16:01+00:00
sending hello : 2024-07-09T15:16:03+00:00
sending hello : 2024-07-09T15:16:05+00:00
sending hello : 2024-07-09T15:16:07+00:00
sending hello : 2024-07-09T15:16:09+00:00
```

Even though one of the pods is Ready to receive traffic. A reschedule of the sender pod gets things flowing again as well as manually cleaning up connntrack entries.



### Anything else we need to know?

Potentially related:
* https://github.com/kubernetes/kubernetes/pull/106163
* https://github.com/kubernetes/kubernetes/issues/106274


### Kubernetes version

<details>

```console
v1.30.1
```

</details>


### Cloud provider

<details>
Baremetal
</details>


### OS version

<details>

```
cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
```

```
uname -a
Linux 36com699 6.6.34-cloudflare-2024.6.2 #1 SMP PREEMPT_DYNAMIC Mon Sep 27 00:00:00 UTC 2010 x86_64 GNU/Linux
```

</details>


### Install tools

<details>

</details>


### Container runtime (CRI) and version (if applicable)

<details>
cri-o 1.30.2-1
</details>


### Related plugins (CNI, CSI, ...) and versions (if applicable)

<details>
calico v3.27.3
</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dropped UDP packets on pod restart #125979

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dropped UDP packets on pod restart #125979

Description

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions