Skip to content

Conversation

jrajahalme
Copy link
Member

@jrajahalme jrajahalme commented Jul 9, 2025

[ upstream commit 92a3319 ]

Require TCP ACK flag to not be set when SYN is set to recreate a CT entry.

This addresses the problem where CT entry is created in the reply direction without a proxy redirect flag when a forward direction CT entry with proxy redirect flag already exists, when a packet from an Envoy upstream connection destination reaches bpf_lxc of the source pod.

The CT proxy redirect flag exists for the purpose of routing reply direction packets to the proxy when they reach the source pod's bpf_lxc program. Recreting the CT entry on the basis of the TCP SYN flag without requiring the ACK flag to be unset defeats this purpose and stalls traffic on source pod/Envoy (downstream) connection.

Example of creation of CT entry in the reply direction (only showing reply direction flows, SYN/ACK in the middle is for a proxy upstream connection that needs to be redirected to the proxy instead of the source pod):

-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0x61a4168c , identity 60249->44892 state new ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0x1fa7df2a , identity 60249->44892 state established ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK

CT entries, the second one being created in the reply direction:

TCP OUT 10.244.1.234:44892 -> 10.244.0.215:80 expires=1595143 Packets=0 Bytes=0 RxFlagsSeen=0x1b LastRxReport=1587142 TxFlagsSeen=0x00 LastTxReport=1587115 Flags=0x0051 [ RxClosing SeenNonSyn ProxyRedirect ] RevNAT=0 SourceSecurityID=44892 BackendID=0
TCP IN 10.244.0.215:80 -> 10.244.1.234:44892 expires=1595143 Packets=0 Bytes=0 RxFlagsSeen=0x12 LastRxReport=1587116 TxFlagsSeen=0x19 LastTxReport=1587142 Flags=0x0412 [ TxClosing SeenNonSyn FromTunnel ] RevNAT=0 SourceSecurityID=60249 BackendID=0

Requiring the ACK flag be cleared when seeing the SYN flag being set prevents the second CT entry from being created.

Related: #32653

Bpf datapath TCP conntrack entries are (re)created only in the forward direction, solving an issue with freezing proxy connections when backend connection is re-opened.

@jrajahalme jrajahalme requested a review from a team as a code owner July 9, 2025 19:05
@jrajahalme jrajahalme added kind/backports This PR provides functionality previously merged into master. backport/1.16 This PR represents a backport for Cilium 1.16.x of a PR that was merged to main. labels Jul 9, 2025
@jrajahalme
Copy link
Member Author

/test

@mathpl
Copy link
Contributor

mathpl commented Jul 10, 2025

Is this backport aimed at solving #40462? I still see non-cluster connectivity tests failing in the conformance EKS run here.

@jrajahalme
Copy link
Member Author

Is this backport aimed at solving #40462? I still see non-cluster connectivity tests failing in the conformance EKS run here.

No, this fixing an unrelated bug and is just taking collateral damage from the EKS bug(?).

[ upstream commit 92a3319 ]

Require TCP ACK flag to not be set when SYN is set to recreate a CT entry.

This addresses the problem where CT entry is created in the reply
direction without a proxy redirect flag when a forward direction CT entry
with proxy redirect flag already exists, when a packet from an Envoy
upstream connection destination reaches bpf_lxc of the source pod.

The CT proxy redirect flag exists for the purpose of routing reply
direction packets to the proxy when they reach the source pod's bpf_lxc
program. Recreting the CT entry on the basis of the TCP SYN flag without
requiring the ACK flag to be unset defeats this purpose and stalls
traffic on source pod/Envoy (downstream) connection.

Example of creation of CT entry in the reply direction (only showing
reply direction flows, SYN/ACK in the middle is for a proxy upstream
connection that needs to be redirected to the proxy instead of the source
pod):

-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0x61a4168c , identity 60249->44892 state new ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0x1fa7df2a , identity 60249->44892 state established ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK

CT entries, the second one being created in the reply direction:

TCP OUT 10.244.1.234:44892 -> 10.244.0.215:80 expires=1595143 Packets=0 Bytes=0 RxFlagsSeen=0x1b LastRxReport=1587142 TxFlagsSeen=0x00 LastTxReport=1587115 Flags=0x0051 [ RxClosing SeenNonSyn ProxyRedirect ] RevNAT=0 SourceSecurityID=44892 BackendID=0
TCP IN 10.244.0.215:80 -> 10.244.1.234:44892 expires=1595143 Packets=0 Bytes=0 RxFlagsSeen=0x12 LastRxReport=1587116 TxFlagsSeen=0x19 LastTxReport=1587142 Flags=0x0412 [ TxClosing SeenNonSyn FromTunnel ] RevNAT=0 SourceSecurityID=60249 BackendID=0

Requiring the ACK flag be cleared when seeing the SYN flag being set
prevents the second CT entry from being created.

Related: cilium#32653
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
@asauber asauber force-pushed the bpf-conntrack-recreate-fix-1.16 branch from 9de3136 to 517dd02 Compare July 14, 2025 11:12
@asauber
Copy link
Member

asauber commented Jul 14, 2025

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jul 14, 2025
@aanm aanm added this pull request to the merge queue Jul 14, 2025
Merged via the queue into cilium:v1.16 with commit 255bf46 Jul 14, 2025
60 of 62 checks passed
@joestringer
Copy link
Member

Heads up that since this didn't follow the standard backport pattern, the release notes generation was a bit off for this PR:

https://github.com/cilium/cilium/blob/dd0556cc1c9f91cf011bacc9d9e309f4b0eb376f/CHANGELOG.md#v11612

Specifically, this ended up in the "Other changes" and not "bugfix" section. There are two ways to mitigate this in future:

  • Ensure that backport PRs have a section that looks like this in the description of the PR:
    ```upstream-prs
    40427
    ```
    
    Note that this also makes the integration with the backport-pending/1.X labels work to automatically update to backport-done/1.X.
  • If the PR doesn't have a corresponding main PR, you can just set the release-note/bug label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.16 This PR represents a backport for Cilium 1.16.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants