-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
Is there an existing issue for this?
- I have searched the existing issues
What happened?
When Azure LB is deployed to expose Kubernetes service, it will never become healthy and load balance traffic as health probes are failing.
Probes are reaching nodes properly but it seems that probes are dropped/denied by Cilium and thus load balancer is not receiving responds to probes.
LB probes have source address from IPv6 link-local address space, in my case fe80::1234:5678:9abc
.
tcpdump at node:
tcpdump -nei any ip6 and port 30460
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
12:52:51.558487 enP47348s1 In ifindex 3 12:34:56:78:9a:bc ethertype IPv6 (0x86dd), length 92: fe80::1234:5678:9abc.65168 > fddc:bbc4:705:2::a.30460: Flags [SEW], seq 1547269981, win 64800, options [mss 1440,nop,wscale 8,nop,nop,sackOK], length 0
12:52:51.558487 eth0 In ifindex 2 12:34:56:78:9a:bc ethertype IPv6 (0x86dd), length 92: fe80::1234:5678:9abc.65168 > fddc:bbc4:705:2::a.30460: Flags [SEW], seq 1547269981, win 64800, options [mss 1440,nop,wscale 8,nop,nop,sackOK], length 0
12:52:52.573384 enP47348s1 In ifindex 3 12:34:56:78:9a:bc ethertype IPv6 (0x86dd), length 92: fe80::1234:5678:9abc.65168 > fddc:bbc4:705:2::a.30460: Flags [SEW], seq 1547269981, win 64800, options [mss 1440,nop,wscale 8,nop,nop,sackOK], length 0
12:52:52.573384 eth0 In ifindex 2 12:34:56:78:9a:bc ethertype IPv6 (0x86dd), length 92: fe80::1234:5678:9abc.65168 > fddc:bbc4:705:2::a.30460: Flags [SEW], seq 1547269981, win 64800, options [mss 1440,nop,wscale 8,nop,nop,sackOK], length 0
12:52:54.579086 enP47348s1 In ifindex 3 12:34:56:78:9a:bc ethertype IPv6 (0x86dd), length 92: fe80::1234:5678:9abc.65168 > fddc:bbc4:705:2::a.30460: Flags [S], seq 1547269981, win 64800, options [mss 1440,nop,wscale 8,nop,nop,sackOK], length 0
12:52:54.579086 eth0 In ifindex 2 12:34:56:78:9a:bc ethertype IPv6 (0x86dd), length 92: fe80::1234:5678:9abc.65168 > fddc:bbc4:705:2::a.30460: Flags [S], seq 1547269981, win 64800, options [mss 1440,nop,wscale 8,nop,nop,sackOK], length 0
12:52:57.574024 enP47348s1 In ifindex 3 12:34:56:78:9a:bc ethertype IPv6 (0x86dd), length 92: fe80::1234:5678:9abc.65330 > fddc:bbc4:705:2::a.30460: Flags [SEW], seq 3203281907, win 64800, options [mss 1440,nop,wscale 8,nop,nop,sackOK], length 0
12:52:57.574024 eth0 In ifindex 2 12:34:56:78:9a:bc ethertype IPv6 (0x86dd), length 92: fe80::1234:5678:9abc.65330 > fddc:bbc4:705:2::a.30460: Flags [SEW], seq 3203281907, win 64800, options [mss 1440,nop,wscale 8,nop,nop,sackOK], length 0
12:52:58.587394 enP47348s1 In ifindex 3 12:34:56:78:9a:bc ethertype IPv6 (0x86dd), length 92: fe80::1234:5678:9abc.65330 > fddc:bbc4:705:2::a.30460: Flags [SEW], seq 3203281907, win 64800, options [mss 1440,nop,wscale 8,nop,nop,sackOK], length 0
12:52:58.587394 eth0 In ifindex 2 12:34:56:78:9a:bc ethertype IPv6 (0x86dd), length 92: fe80::1234:5678:9abc.65330 > fddc:bbc4:705:2::a.30460: Flags [SEW], seq 3203281907, win 64800, options [mss 1440,nop,wscale 8,nop,nop,sackOK], length 0
12:53:00.601813 enP47348s1 In ifindex 3 12:34:56:78:9a:bc ethertype IPv6 (0x86dd), length 92: fe80::1234:5678:9abc.65330 > fddc:bbc4:705:2::a.30460: Flags [S], seq 3203281907, win 64800, options [mss 1440,nop,wscale 8,nop,nop,sackOK], length 0
12:53:00.601813 eth0 In ifindex 2 12:34:56:78:9a:bc ethertype IPv6 (0x86dd), length 92: fe80::1234:5678:9abc.65330 > fddc:bbc4:705:2::a.30460: Flags [S], seq 3203281907, win 64800, options [mss 1440,nop,wscale 8,nop,nop,sackOK], length 0
I managed to capture logs from hubble and cilium monitor. Both contains drop verdict, but I captured such drop verdicts just once and not again later.
Issue is reproducible either by using Azure cloud provider to expose Kubernetes service automatically or by creating NodePort Kubernetes service and deploying load balancer manually in Azure.
Cilium Version
v1.12.4
v1.13.2
Kernel Version
5.15.0-1036-azure #43-Ubuntu SMP Wed Mar 29 16:11:05 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Kubernetes Version
Client Version: v1.26.0
Kustomize Version: v4.5.7
Server Version: v1.25.4
Sysdump
No response
Relevant log output
hubble observe -f --verdict DROPPED
Apr 26 11:12:23.171: [fddc:bbc4:705:2::4]:30460 (host) <> [fe80::1234:5678:9abc]:54675 (world) Stale or unroutable IP DROPPED (TCP Flags: SYN, ACK)
Apr 26 11:12:23.395: [fddc:bbc4:705:2::4]:30460 (host) <> [fe80::1234:5678:9abc]:54503 (world) Stale or unroutable IP DROPPED (TCP Flags: SYN, ACK)
Apr 26 11:12:24.155: [fddc:bbc4:705:2::4]:30460 (host) <> [fe80::1234:5678:9abc]:54769 (world) Stale or unroutable IP DROPPED (TCP Flags: SYN, ACK)
Apr 26 11:12:25.154: [fddc:bbc4:705:2::4]:30460 (host) <> [fe80::1234:5678:9abc]:54769 (world) Stale or unroutable IP DROPPED (TCP Flags: SYN, ACK)
cilium monitor -t drop
Listening for events on 8 CPUs with 64x4096 of shared memory
Press Ctrl-C to quit
level=info msg="Initializing dissection cache..." subsys=monitor
xx drop (Stale or unroutable IP) flow 0x2885480f to endpoint 0, file bpf_host.c line 351, , identity host->unknown: [fddc:bbc4:705:2::4]:30460 -> [fe80::1234:5678:9abc]:56055 tcp SYN, ACK
xx drop (Stale or unroutable IP) flow 0xa247e0c9 to endpoint 0, file bpf_host.c line 351, , identity host->unknown: [fddc:bbc4:705:2::4]:30460 -> [fe80::1234:5678:9abc]:56055 tcp SYN, ACK
xx drop (Stale or unroutable IP) flow 0x6aa6b80e to endpoint 0, file bpf_host.c line 351, , identity host->unknown: [fddc:bbc4:705:2::4]:30460 -> [fe80::1234:5678:9abc]:55938 tcp SYN, ACK
xx drop (Stale or unroutable IP) flow 0x84d4eaf1 to endpoint 0, file bpf_host.c line 351, , identity host->unknown: [fddc:bbc4:705:2::4]:30460 -> [fe80::1234:5678:9abc]:55564 tcp SYN, ACK
xx drop (Stale or unroutable IP) flow 0x33bb69e7 to endpoint 0, file bpf_host.c line 351, , identity host->unknown: [fddc:bbc4:705:2::4]:30460 -> [fe80::1234:5678:9abc]:56055 tcp SYN, ACK
Anything else?
No response
Code of Conduct
- I agree to follow this project's Code of Conduct