Skip to content

Stale service conntrack entries causing packet drop #32472

@sypakine

Description

@sypakine

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

Intermittent occurrence of Service backend not found errors:

$ cilium monitor --type drop
...
xx drop (Service backend not found) flow 0x7558de48 to endpoint 0, file bpf_lxc.c line 1480, , identity 44899->unknown: 10.196.161.9:45318 -> 34.1.1.100:443 tcp SYN
xx drop (Service backend not found) flow 0x6de3ccb3 to endpoint 0, file bpf_lxc.c line 1480, , identity 44899->unknown: 10.196.161.9:45318 -> 34.1.1.100:443 tcp SYN
xx drop (Service backend not found) flow 0xb913610f to endpoint 0, file bpf_lxc.c line 1480, , identity 44899->unknown: 10.196.161.9:45318 -> 34.1.1.100:443 tcp SYN

However, there are backends for the service (and other connections to 34.1.1.100:443 from the same source pod work as expected):

# cilium bpf lb list | grep 34.1.1.100:443/i -A 6
34.1.1.100:443/i     10.196.158.2:8443 (88) (8)
                        10.196.179.3:8443 (88) (9)
                        10.196.177.24:8443 (88) (4)
                        10.196.128.2:8443 (88) (6)
                        0.0.0.0:0 (88) (0) [LoadBalancer, Local]
                        10.196.155.225:8443 (88) (7)
                        10.196.179.2:8443 (88) (10)

Turned on datapath debugging and discovered this is due to a stale service conntrack entries:

<- endpoint 1804 flow 0xb00723b3 , identity 44899->unknown state unknown ifindex 0 orig-ip 0.0.0.0: 10.196.161.9:45318 -> 34.1.1.100:443 tcp SYN
CPU 04: MARK 0xb00723b3 FROM 1804 DEBUG: Conntrack lookup 1/2: src=10.196.161.9:45318 dst=34.1.1.100:443
CPU 04: MARK 0xb00723b3 FROM 1804 DEBUG: Conntrack lookup 2/2: nexthdr=6 flags=4
CPU 04: MARK 0xb00723b3 FROM 1804 DEBUG: CT entry found lifetime=20637745, revnat=88
CPU 04: MARK 0xb00723b3 FROM 1804 DEBUG: CT verdict: Reply, revnat=88
CPU 04: MARK 0xb00723b3 FROM 1804 DEBUG: Backend service lookup failed: backend_id=272
xx drop (Service backend not found) flow 0xb00723b3 to endpoint 0, file bpf_lxc.c line 1480, , identity 44899->unknown: 10.196.161.9:45318 -> 34.1.1.100:443 tcp SYN

Note that the connection from the client is a SYN segment, but the conntrack lookup sees it as an open connection:

So dumped the service conntrack entry and waited for another request with that connection tuple:

Before:

$ cilium bpf ct list global -Dd | grep "TCP OUT 34.1.1.100443 -> 10.196.161.9:" | grep "\[ SeenNonSyn \]" | grep "10.196.161.9:45318"

TCP OUT 34.1.1.100:443 -> 10.196.161.9:45318 service expires=25446799 (remaining: 19197 sec(s)) RxPackets=0 RxBytes=314 RxFlagsSeen=0x00 LastRxReport=0 TxPackets=0 TxBytes=0 TxFlagsSeen=0x1a LastTxReport=25362412 Flags=0x0010 [ SeenNonSyn ] RevNAT=88 SourceSecurityID=0 IfIndex=0

Observe flow for tuple:

$ cilium monitor --related-to 1123 | grep "45318"
Listening for events on 16 CPUs with 64x4096 of shared memory
Press Ctrl-C to quit
level=info msg="Initializing dissection cache..." subsys=monitor

xx drop (Service backend not found) flow 0x45ed4e5f to endpoint 0, file bpf_lxc.c line 1480, , identity 44899->unknown: 10.196.161.9:45318 -> 34.1.1.100:443 tcp SYN
xx drop (Service backend not found) flow 0x59b66eb5 to endpoint 0, file bpf_lxc.c line 1480, , identity 44899->unknown: 10.196.161.9:45318 -> 34.1.1.100:443 tcp SYN
xx drop (Service backend not found) flow 0xb5441506 to endpoint 0, file bpf_lxc.c line 1480, , identity 44899->unknown: 10.196.161.9:45318 -> 34.1.1.100:443 tcp SYN
^C
Received an interrupt, disconnecting from monitor...

After:

$ cilium bpf ct list global -Dd | grep "TCP OUT 34.1.1.100:443 -> 10.196.161.9:" | grep "\[ SeenNonSyn \]" | grep "10.196.161.9:45318"

TCP OUT 34.1.1.100:443 -> 10.196.161.9:45318 service expires=25456278 (remaining: 21578 sec(s)) RxPackets=0 RxBytes=314 RxFlagsSeen=0x00 LastRxReport=0 TxPackets=0 TxBytes=0 TxFlagsSeen=0x1a LastTxReport=25371891 Flags=0x0010 [ SeenNonSyn ] RevNAT=88 SourceSecurityID=0 IfIndex=0

So it looks like even with a service backend miss, there is no feedback to the conntrack state to invalidate the entry. If a client is making frequent short-lived connections to the service, there is a high probability they will reuse the connection tuple as they cycle through the source ports.


One observation is that a SYN segment seen for an open CT conntrack entry is OK, as long as the service lookup resolves to an existing backend index:

<- endpoint 1804 flow 0x2847f5e0 , identity 44899->unknown state unknown ifindex 0 orig-ip 0.0.0.0: 10.196.161.9:56744 -> 34.1.1.100:443 tcp SYN
CPU 07: MARK 0x2847f5e0 FROM 1804 DEBUG: Conntrack lookup 1/2: src=10.196.161.9:56744 dst=34.1.1.100:443
CPU 07: MARK 0x2847f5e0 FROM 1804 DEBUG: Conntrack lookup 2/2: nexthdr=6 flags=4
CPU 07: MARK 0x2847f5e0 FROM 1804 DEBUG: CT entry found lifetime=20657581, revnat=88
CPU 07: MARK 0x2847f5e0 FROM 1804 DEBUG: CT verdict: Reply, revnat=88
# No logging for it, but means service backend was found for the index
CPU 07: MARK 0x2847f5e0 FROM 1804 DEBUG: Conntrack lookup 1/2: src=10.196.161.9:56744 dst=10.196.169.6:8443
CPU 07: MARK 0x2847f5e0 FROM 1804 DEBUG: Conntrack lookup 2/2: nexthdr=6 flags=1
CPU 07: MARK 0x2847f5e0 FROM 1804 DEBUG: CT entry found lifetime=20573245, revnat=88
CPU 07: MARK 0x2847f5e0 FROM 1804 DEBUG: CT verdict: Interface Decrypted, revnat=88
CPU 01: MARK 0x2847f5e0 FROM 1804 DEBUG: Successfully mapped addr=10.196.169.6 to identity=782
CPU 01: MARK 0x2847f5e0 FROM 1804 DEBUG: Matched L4 policy; creating conntrack src=782 dst=44899 dport=8443 proto=6
Policy verdict log: flow 0x2847f5e0 local EP ID 1804, remote ID 782, proto 6, egress, action allow, match L3-L4, 10.196.161.9:56744 -> 10.196.169.6:8443 tcp SYN
-> stack flow 0x2847f5e0 , identity 44899->782 state reopened ifindex 0 orig-ip 0.0.0.0: 10.196.161.9:56744 -> 10.196.169.6:8443 tcp SYN
...

### Cilium Version

Client: 1.12.10 c0f319ad22 2024-02-06T18:46:58+00:00 go version go1.20.14 linux/amd64

### Kernel Version

Linux node_name 5.15.146+ #1 SMP Sat Feb 17 13:12:02 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

### Kubernetes Version

Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.11-gke.1062001", GitCommit:"2cefeadcb4ec7d21d775d15012b02d3393a53548", GitTreeState:"clean", BuildDate:"2024-04-16T20:17:53Z", GoVersion:"go1.21.7 X:boringcrypto", Compiler:"gc", Platform:"linux/amd64"}

### Regression

_No response_

### Sysdump

_No response_

### Relevant log output

_No response_

### Anything else?

I have been unable to reproduce the issue. My attempt has been to:

1. Sccale service backend deployment to 100 (to create many backends to be indexed in the service map)
2. Create a long-lived TCP connection from a client to a service backend
3. After the TCP handshake, sever the connection (turn down the lxc interface)
5. Scale down the backends to 1
6. Kill the client container via `crictl`
7. Turn-up the lxc interface
8. Reuse connection tuple to create new connection

---

Also, is there a way to show the backend index for a service type conntrack entry? Even with json/yaml output I don't see it.

### Cilium Users Document

- [ ] Are you a user of Cilium? Please add yourself to the [Users doc](https://github.com/cilium/cilium/blob/main/USERS.md)

### Code of Conduct

- [X] I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/datapathImpacts bpf/ or low-level forwarding details, including map management and monitor messages.area/loadbalancingImpacts load-balancing and Kubernetes service implementationsfeature/conntrackinfo-completedThe GH issue has received a reply from the authorkind/bugThis is a bug in the Cilium logic.kind/community-reportThis was reported by a user in the Cilium community, eg via Slack.needs/triageThis issue requires triaging to establish severity and next steps.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions