-
Notifications
You must be signed in to change notification settings - Fork 41.3k
Description
What happened?
We are seeing occasional instances where a pod receiving UDP packets is
restarted and the new pod doesn't receive packets. The packets are still
being sent by the sending pods, but are being blackholed.
Specifically, there are two pods sending UDP packets to a ClusterIP service.
The UDP source port is not randomized. The ClusterIP service is backed by two
pods where only one is Ready at a given time (managed by leader election). When
the receiving pod that is leader is deleted (gracefully), most of the time
traffic flows correctly from the two sending pods to the new leader. However
about every 4/5 restarts, one or more of the sending pods starts blackholing
traffic.
This is evidenced by the following kube-proxy logs from the nodes where the
sending pod is running. The following captures were taken at the same time from
two different nodes. A node that did cleanup conntrack entries:
And a node that did not clean up conntrack entries:
On one of these occurances we were able to validate that conntrack entries were
left and cleaning them up manually allowed traffic to flow:
conntrack -D -p udp --dport 6858 -r 172.18.209.67
udp 17 29 src=172.18.203.201 dst=10.32.216.47 sport=6852 dport=6858 [UNREPLIED] src=172.18.209.67 dst=172.18.203.201 sport=6858 dport=6852 mark=0 use=1
conntrack v1.4.7 (conntrack-tools): 1 flow entries have been deleted.
In addition to pod restarts, we have seen other cases leading to blackholing of
UDP traffic but don't have clear reproducers for these scenarios yet.
What did you expect to happen?
We expect that when a pod is deleted, its traffic is moved to the new pod. This happens most of the time.
How can we reproduce it (as minimally and precisely as possible)?
Deploy sender and receiver manifests from
https://github.com/muff1nman/leader-election-udp. Make one sender pod ready
such that the deployment rolls out:
kubectl exec -it receiver-8696979cd6-sj2bp -- bash -c "echo yes > /var/run/udplisten/ready"
Then you should have the following:
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
receiver-8696979cd6-klpgj 10/10 Running 0 3m3s
receiver-8696979cd6-t88dp 0/10 Running 0 48s
sender-744d8d5dd-qh5j2 1/1 Running 0 93s
Verify the logs of the sender are as expected:
$ kubectl logs -f deploy/sender
sending hello : 2024-07-09T15:16:52+00:00
ack from receiver-8696979cd6-2kkzj-1100
sending hello : 2024-07-09T15:16:53+00:00
ack from receiver-8696979cd6-2kkzj-1101
sending hello : 2024-07-09T15:16:54+00:00
ack from receiver-8696979cd6-2kkzj-1102
sending hello : 2024-07-09T15:16:55+00:00
This is the expected good state.
Now run test.sh
to attempt getting traffic from the sender blackholed. There
are still unknowns as to the exact requirements for timing, so it might take a
couple tries. In my environment it happens about a third of the time. Between
attempts make sure pods are fully running (no ContainerCreating). A successful
trigger then looks like this:
ack from receiver-8696979cd6-lc8sg-1100
sending hello : 2024-07-09T15:15:17+00:00
ack from receiver-8696979cd6-lc8sg-1101
sending hello : 2024-07-09T15:15:18+00:00
ack from receiver-8696979cd6-lc8sg-1102
sending hello : 2024-07-09T15:15:19+00:00
ack from receiver-8696979cd6-lc8sg-1103
sending hello : 2024-07-09T15:15:20+00:00
sending hello : 2024-07-09T15:15:21+00:00
sending hello : 2024-07-09T15:15:22+00:00
sending hello : 2024-07-09T15:15:23+00:00
sending hello : 2024-07-09T15:15:24+00:00
sending hello : 2024-07-09T15:15:25+00:00
sending hello : 2024-07-09T15:15:26+00:00
sending hello : 2024-07-09T15:15:27+00:00
sending hello : 2024-07-09T15:15:28+00:00
sending hello : 2024-07-09T15:15:29+00:00
sending hello : 2024-07-09T15:15:30+00:00
sending hello : 2024-07-09T15:15:31+00:00
sending hello : 2024-07-09T15:15:32+00:00
sending hello : 2024-07-09T15:15:33+00:00
sending hello : 2024-07-09T15:15:34+00:00
sending hello : 2024-07-09T15:15:35+00:00
sending hello : 2024-07-09T15:15:36+00:00
sending hello : 2024-07-09T15:15:37+00:00
sending hello : 2024-07-09T15:15:38+00:00
sending hello : 2024-07-09T15:15:39+00:00
sending hello : 2024-07-09T15:15:40+00:00
sending hello : 2024-07-09T15:15:41+00:00
sending hello : 2024-07-09T15:15:42+00:00
sending hello : 2024-07-09T15:15:43+00:00
sending hello : 2024-07-09T15:15:44+00:00
sending hello : 2024-07-09T15:15:45+00:00
sending hello : 2024-07-09T15:15:46+00:00
sending hello : 2024-07-09T15:15:47+00:00
sending hello : 2024-07-09T15:15:48+00:00
sending hello : 2024-07-09T15:15:49+00:00
sending hello : 2024-07-09T15:15:50+00:00
sending hello : 2024-07-09T15:15:51+00:00
sending hello : 2024-07-09T15:15:53+00:00
sending hello : 2024-07-09T15:15:55+00:00
sending hello : 2024-07-09T15:15:57+00:00
sending hello : 2024-07-09T15:15:59+00:00
sending hello : 2024-07-09T15:16:01+00:00
sending hello : 2024-07-09T15:16:03+00:00
sending hello : 2024-07-09T15:16:05+00:00
sending hello : 2024-07-09T15:16:07+00:00
sending hello : 2024-07-09T15:16:09+00:00
Even though one of the pods is Ready to receive traffic. A reschedule of the sender pod gets things flowing again as well as manually cleaning up connntrack entries.
Anything else we need to know?
Potentially related:
- kube-proxy consider endpoint readiness to delete UDP stale conntrack entries #106163
- UDP traffic to a single Pod behind a ClusterIP is blackholed when the destination Pod is recreated (wrong conntrack) #106274
Kubernetes version
v1.30.1
Cloud provider
OS version
cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
uname -a
Linux 36com699 6.6.34-cloudflare-2024.6.2 #1 SMP PREEMPT_DYNAMIC Mon Sep 27 00:00:00 UTC 2010 x86_64 GNU/Linux