Skip to content

Conversation

jrajahalme
Copy link
Member

CT_REOPENED was originally added in #13340 to emit policy verdicts for apparently re-opened TCP connections, which are in fact more likely to be newly opened TCP connections rather than re-opened ones, as the CT entries may live minutes after the TCP state from the endpoints has already timed out.

This added complexity to call sites, forcing differentiation between CT_NEW and CT_REOPENED. In all cases some CT entry field values were left stale, e.g., 'proxy_redirect' after a policy change.

Instead of adjusting each call site to behave properly for CT_REOPENED, return CT_NEW instead, and make the observable CT lookup behavior the same as for CT_NEW in that case, most notably by not updating the passed in `*ct_state'.

This change fixes proxy redirection bug where return packets are not redirected to an L7 proxy when (a stale) CT entry is missing the 'proxy_redirect' flag.

For this bug to trigger a pod needs to open a new TCP connection using the same (ephemeral) source port to the same destination before and after a change in policy adding an L7 policy applicable to that connection. While this is a bug that has surfaced in CI, this bug is less likely to be hit when policies are more stable. Given this we need further discussion about backporting this bug fix to the stable releases.

Fixes: #27762
Fixes: #13340

Note: This PR is an alternative to #32614, only one of these should be merged.

Datapath conntrack entries for reopened connections are fully reinitialized to fix rare L7 proxy redirect failures.

@jrajahalme jrajahalme added kind/bug This is a bug in the Cilium logic. area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/bug This PR fixes an issue in a previous release of Cilium. feature/conntrack labels May 21, 2024
@jrajahalme jrajahalme requested a review from julianwiedmann May 21, 2024 16:17
@jrajahalme jrajahalme requested review from a team as code owners May 21, 2024 16:17
@jrajahalme jrajahalme requested a review from kaworu May 21, 2024 16:17
@jrajahalme
Copy link
Member Author

/test

@julianwiedmann
Copy link
Member

Thank you for the context on why CT_REOPENED was introduced in the first place! That makes it way easier to evaluate the change.

Looking at all the info in the ct_entry and what's already cleared by the CT lookup, there isn't much that would be worth preserving from the "old" connection. Essentially just packets and bytes. The lifetime looks like ct_create4() will end up setting the same value. We probably should have a look at how monitor gets sets for the CT_REOPENED path, so that last_tx_report / last_rx_report stay in sync with that. But besides that, completely resetting the ct_entry seems alright.

So this makes a lot of sense to me 👍. And we can figure out what to do about stale CT_ESTABLISHED lookup results separately.

@jrajahalme jrajahalme force-pushed the bpf-ct-remove-reopened branch 2 times, most recently from f415f22 to 83602ad Compare May 28, 2024 10:34
@jrajahalme
Copy link
Member Author

/test

Copy link
Member

@kaworu kaworu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, Hubble changes LGTM now.

@jrajahalme jrajahalme force-pushed the bpf-ct-remove-reopened branch from 83602ad to 21c6a94 Compare June 5, 2024 10:56
@jrajahalme
Copy link
Member Author

/test

Copy link
Member

@julianwiedmann julianwiedmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you Jarno!

✔️, assuming that ct_update_dsr() is re-instated to fix the build.

@julianwiedmann julianwiedmann added the area/proxy Impacts proxy components, including DNS, Kafka, Envoy and/or XDS servers. label Jun 5, 2024
CT_REOPENED was originally added in
cilium#13340 to emit policy verdicts for
apparently re-opened TCP connections, which are in fact more likely to be
newly opened TCP connections rather than re-opened ones, as the CT
entries may live minutes after the TCP state from the endpoints has
already timed out.

This added complexity to call sites, forcing differentiation between
CT_NEW and CT_REOPENED. In all cases some CT entry field values were left
stale, e.g., 'proxy_redirect' after a policy change.

Instead of adjusting each call site to behave properly for CT_REOPENED,
return CT_NEW instead, and make the observable CT lookup behavior the
same as for CT_NEW in that case, most notably by not updating the passed
in `*ct_state'.

This change fixes proxy redirection bug where return packets are not
redirected to an L7 proxy when (a stale) CT entry is missing the
'proxy_redirect' flag.

Fixes: cilium#27762
Fixes: cilium#13340
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Datapath no longer returns a trace reason for REOPENED. Keep the Go
symbol for compatibility with older datapaths, but rename it to mark it
as deperecated.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
@jrajahalme jrajahalme force-pushed the bpf-ct-remove-reopened branch from 21c6a94 to a0e0c53 Compare June 5, 2024 15:14
@jrajahalme jrajahalme requested a review from a team as a code owner June 5, 2024 15:14
@jrajahalme jrajahalme requested a review from ysksuzuki June 5, 2024 15:14
@julianwiedmann
Copy link
Member

/test

@julianwiedmann julianwiedmann added affects/v1.14 This issue affects v1.14 branch affects/v1.15 This issue affects v1.15 branch labels Jun 14, 2024
ysksuzuki added a commit to ysksuzuki/cilium that referenced this pull request Jun 18, 2024
This commit fixes the issue that datapath erroneously redirects
(or doesn't redirect) the reply packets to the proxy if the packet
hits the stale CT entry.

The PR cilium#32653 fixed the issue for TCP by having __ct_lookup return
CT_NEW if the packet hits a closing stale entry so that the caller
can recreate an entry to update the proxy_redirect flag.

This commit lets datapath recreate an entry for non-TCP in the similar
case to update the proxy_redirect flag.

Signed-off-by: Yusuke Suzuki <yusuke.suzuki@isovalent.com>
ysksuzuki added a commit to ysksuzuki/cilium that referenced this pull request Jun 19, 2024
This commit fixes the issue that datapath erroneously redirects
(or doesn't redirect) the reply packets to the proxy if the packet
hits the stale CT entry.

The PR cilium#32653 fixed the issue when the TCP connection hits a closing
stale entry by having __ct_lookup return CT_NEW in that case so that
the caller can recreate an entry to update the proxy_redirect flag.

This commit lets datapath recreate an entry in the case where
non-TCP packets hit the stale CT entry with the proxy_redirect flag,
or an active TCP connection suddenly comes into the scope of an L7 policy.

Signed-off-by: Yusuke Suzuki <yusuke.suzuki@isovalent.com>
github-merge-queue bot pushed a commit that referenced this pull request Jun 21, 2024
This commit fixes the issue that datapath erroneously redirects
(or doesn't redirect) the reply packets to the proxy if the packet
hits the stale CT entry.

The PR #32653 fixed the issue when the TCP connection hits a closing
stale entry by having __ct_lookup return CT_NEW in that case so that
the caller can recreate an entry to update the proxy_redirect flag.

This commit lets datapath recreate an entry in the case where
non-TCP packets hit the stale CT entry with the proxy_redirect flag,
or an active TCP connection suddenly comes into the scope of an L7 policy.

Signed-off-by: Yusuke Suzuki <yusuke.suzuki@isovalent.com>
YutaroHayakawa pushed a commit that referenced this pull request Jun 25, 2024
[ upstream commit 6552e09 ]

This commit fixes the issue that datapath erroneously redirects
(or doesn't redirect) the reply packets to the proxy if the packet
hits the stale CT entry.

The PR #32653 fixed the issue when the TCP connection hits a closing
stale entry by having __ct_lookup return CT_NEW in that case so that
the caller can recreate an entry to update the proxy_redirect flag.

This commit lets datapath recreate an entry in the case where
non-TCP packets hit the stale CT entry with the proxy_redirect flag,
or an active TCP connection suddenly comes into the scope of an L7 policy.

Signed-off-by: Yusuke Suzuki <yusuke.suzuki@isovalent.com>
Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
YutaroHayakawa pushed a commit that referenced this pull request Jun 25, 2024
[ upstream commit 6552e09 ]

This commit fixes the issue that datapath erroneously redirects
(or doesn't redirect) the reply packets to the proxy if the packet
hits the stale CT entry.

The PR #32653 fixed the issue when the TCP connection hits a closing
stale entry by having __ct_lookup return CT_NEW in that case so that
the caller can recreate an entry to update the proxy_redirect flag.

This commit lets datapath recreate an entry in the case where
non-TCP packets hit the stale CT entry with the proxy_redirect flag,
or an active TCP connection suddenly comes into the scope of an L7 policy.

Signed-off-by: Yusuke Suzuki <yusuke.suzuki@isovalent.com>
Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
YutaroHayakawa pushed a commit that referenced this pull request Jun 27, 2024
[ upstream commit 6552e09 ]

This commit fixes the issue that datapath erroneously redirects
(or doesn't redirect) the reply packets to the proxy if the packet
hits the stale CT entry.

The PR #32653 fixed the issue when the TCP connection hits a closing
stale entry by having __ct_lookup return CT_NEW in that case so that
the caller can recreate an entry to update the proxy_redirect flag.

This commit lets datapath recreate an entry in the case where
non-TCP packets hit the stale CT entry with the proxy_redirect flag,
or an active TCP connection suddenly comes into the scope of an L7 policy.

Signed-off-by: Yusuke Suzuki <yusuke.suzuki@isovalent.com>
Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
christarazi pushed a commit to christarazi/cilium that referenced this pull request Aug 1, 2024
[ upstream commit 6552e09 ]

This commit fixes the issue that datapath erroneously redirects
(or doesn't redirect) the reply packets to the proxy if the packet
hits the stale CT entry.

The PR cilium#32653 fixed the issue when the TCP connection hits a closing
stale entry by having __ct_lookup return CT_NEW in that case so that
the caller can recreate an entry to update the proxy_redirect flag.

This commit lets datapath recreate an entry in the case where
non-TCP packets hit the stale CT entry with the proxy_redirect flag,
or an active TCP connection suddenly comes into the scope of an L7 policy.

Signed-off-by: Yusuke Suzuki <yusuke.suzuki@isovalent.com>
Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
(cherry picked from commit 70f968a)
Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
jrajahalme added a commit to jrajahalme/cilium that referenced this pull request Jul 8, 2025
Require TCP ACK flag to not be set when SYN is set to recreate a CT entry.

This addresses the problem where CT entry is created in the reply
direction without a proxy redirect flag when a forward direction CT entry
with proxy redirect flag already exists, when a packet from an Envoy
upstream connection destination reaches bpf_lxc of the source pod.

The CT proxy redirect flag exists for the purpose of routing reply
direction packets to the proxy when they reach the source pod's bpf_lxc
program. Recreting the CT entry on the basis of the TCP SYN flag without
requiring the ACK flag to be unset defeats this purpose and stalls
traffic on source pod/Envoy (downstream) connection.

Example of creation of CT entry in the reply direction (only showing
reply direction flows, SYN/ACK in the middle is for a proxy upstream
connection that needs to be redirected to the proxy instead of the source
pod):

-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0x61a4168c , identity 60249->44892 state new ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0x1fa7df2a , identity 60249->44892 state established ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK

CT entries, the second one being created in the reply direction:

TCP OUT 10.244.1.234:44892 -> 10.244.0.215:80 expires=1595143 Packets=0 Bytes=0 RxFlagsSeen=0x1b LastRxReport=1587142 TxFlagsSeen=0x00 LastTxReport=1587115 Flags=0x0051 [ RxClosing SeenNonSyn ProxyRedirect ] RevNAT=0 SourceSecurityID=44892 BackendID=0
TCP IN 10.244.0.215:80 -> 10.244.1.234:44892 expires=1595143 Packets=0 Bytes=0 RxFlagsSeen=0x12 LastRxReport=1587116 TxFlagsSeen=0x19 LastTxReport=1587142 Flags=0x0412 [ TxClosing SeenNonSyn FromTunnel ] RevNAT=0 SourceSecurityID=60249 BackendID=0

Requiring the ACK flag be cleared when seeing the SYN flag being set
prevents the second CT entry from being created.

Related: cilium#32653
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
jrajahalme added a commit to jrajahalme/cilium that referenced this pull request Jul 8, 2025
Require TCP ACK flag to not be set when SYN is set to recreate a CT entry.

This addresses the problem where CT entry is created in the reply
direction without a proxy redirect flag when a forward direction CT entry
with proxy redirect flag already exists, when a packet from an Envoy
upstream connection destination reaches bpf_lxc of the source pod.

The CT proxy redirect flag exists for the purpose of routing reply
direction packets to the proxy when they reach the source pod's bpf_lxc
program. Recreting the CT entry on the basis of the TCP SYN flag without
requiring the ACK flag to be unset defeats this purpose and stalls
traffic on source pod/Envoy (downstream) connection.

Example of creation of CT entry in the reply direction (only showing
reply direction flows, SYN/ACK in the middle is for a proxy upstream
connection that needs to be redirected to the proxy instead of the source
pod):

-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0x61a4168c , identity 60249->44892 state new ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0x1fa7df2a , identity 60249->44892 state established ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK

CT entries, the second one being created in the reply direction:

TCP OUT 10.244.1.234:44892 -> 10.244.0.215:80 expires=1595143 Packets=0 Bytes=0 RxFlagsSeen=0x1b LastRxReport=1587142 TxFlagsSeen=0x00 LastTxReport=1587115 Flags=0x0051 [ RxClosing SeenNonSyn ProxyRedirect ] RevNAT=0 SourceSecurityID=44892 BackendID=0
TCP IN 10.244.0.215:80 -> 10.244.1.234:44892 expires=1595143 Packets=0 Bytes=0 RxFlagsSeen=0x12 LastRxReport=1587116 TxFlagsSeen=0x19 LastTxReport=1587142 Flags=0x0412 [ TxClosing SeenNonSyn FromTunnel ] RevNAT=0 SourceSecurityID=60249 BackendID=0

Requiring the ACK flag be cleared when seeing the SYN flag being set
prevents the second CT entry from being created.

Related: cilium#32653
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
jrajahalme added a commit to jrajahalme/cilium that referenced this pull request Jul 9, 2025
Require TCP ACK flag to not be set when SYN is set to recreate a CT entry.

This addresses the problem where CT entry is created in the reply
direction without a proxy redirect flag when a forward direction CT entry
with proxy redirect flag already exists, when a packet from an Envoy
upstream connection destination reaches bpf_lxc of the source pod.

The CT proxy redirect flag exists for the purpose of routing reply
direction packets to the proxy when they reach the source pod's bpf_lxc
program. Recreting the CT entry on the basis of the TCP SYN flag without
requiring the ACK flag to be unset defeats this purpose and stalls
traffic on source pod/Envoy (downstream) connection.

Example of creation of CT entry in the reply direction (only showing
reply direction flows, SYN/ACK in the middle is for a proxy upstream
connection that needs to be redirected to the proxy instead of the source
pod):

-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0x61a4168c , identity 60249->44892 state new ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0x1fa7df2a , identity 60249->44892 state established ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK

CT entries, the second one being created in the reply direction:

TCP OUT 10.244.1.234:44892 -> 10.244.0.215:80 expires=1595143 Packets=0 Bytes=0 RxFlagsSeen=0x1b LastRxReport=1587142 TxFlagsSeen=0x00 LastTxReport=1587115 Flags=0x0051 [ RxClosing SeenNonSyn ProxyRedirect ] RevNAT=0 SourceSecurityID=44892 BackendID=0
TCP IN 10.244.0.215:80 -> 10.244.1.234:44892 expires=1595143 Packets=0 Bytes=0 RxFlagsSeen=0x12 LastRxReport=1587116 TxFlagsSeen=0x19 LastTxReport=1587142 Flags=0x0412 [ TxClosing SeenNonSyn FromTunnel ] RevNAT=0 SourceSecurityID=60249 BackendID=0

Requiring the ACK flag be cleared when seeing the SYN flag being set
prevents the second CT entry from being created.

Related: cilium#32653
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
jrajahalme added a commit to jrajahalme/cilium that referenced this pull request Jul 9, 2025
Require TCP ACK flag to not be set when SYN is set to recreate a CT entry.

This addresses the problem where CT entry is created in the reply
direction without a proxy redirect flag when a forward direction CT entry
with proxy redirect flag already exists, when a packet from an Envoy
upstream connection destination reaches bpf_lxc of the source pod.

The CT proxy redirect flag exists for the purpose of routing reply
direction packets to the proxy when they reach the source pod's bpf_lxc
program. Recreting the CT entry on the basis of the TCP SYN flag without
requiring the ACK flag to be unset defeats this purpose and stalls
traffic on source pod/Envoy (downstream) connection.

Example of creation of CT entry in the reply direction (only showing
reply direction flows, SYN/ACK in the middle is for a proxy upstream
connection that needs to be redirected to the proxy instead of the source
pod):

-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0x61a4168c , identity 60249->44892 state new ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0x1fa7df2a , identity 60249->44892 state established ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK

CT entries, the second one being created in the reply direction:

TCP OUT 10.244.1.234:44892 -> 10.244.0.215:80 expires=1595143 Packets=0 Bytes=0 RxFlagsSeen=0x1b LastRxReport=1587142 TxFlagsSeen=0x00 LastTxReport=1587115 Flags=0x0051 [ RxClosing SeenNonSyn ProxyRedirect ] RevNAT=0 SourceSecurityID=44892 BackendID=0
TCP IN 10.244.0.215:80 -> 10.244.1.234:44892 expires=1595143 Packets=0 Bytes=0 RxFlagsSeen=0x12 LastRxReport=1587116 TxFlagsSeen=0x19 LastTxReport=1587142 Flags=0x0412 [ TxClosing SeenNonSyn FromTunnel ] RevNAT=0 SourceSecurityID=60249 BackendID=0

Requiring the ACK flag be cleared when seeing the SYN flag being set
prevents the second CT entry from being created.

Related: cilium#32653
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
github-merge-queue bot pushed a commit that referenced this pull request Jul 9, 2025
Require TCP ACK flag to not be set when SYN is set to recreate a CT entry.

This addresses the problem where CT entry is created in the reply
direction without a proxy redirect flag when a forward direction CT entry
with proxy redirect flag already exists, when a packet from an Envoy
upstream connection destination reaches bpf_lxc of the source pod.

The CT proxy redirect flag exists for the purpose of routing reply
direction packets to the proxy when they reach the source pod's bpf_lxc
program. Recreting the CT entry on the basis of the TCP SYN flag without
requiring the ACK flag to be unset defeats this purpose and stalls
traffic on source pod/Envoy (downstream) connection.

Example of creation of CT entry in the reply direction (only showing
reply direction flows, SYN/ACK in the middle is for a proxy upstream
connection that needs to be redirected to the proxy instead of the source
pod):

-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0x61a4168c , identity 60249->44892 state new ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0x1fa7df2a , identity 60249->44892 state established ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK

CT entries, the second one being created in the reply direction:

TCP OUT 10.244.1.234:44892 -> 10.244.0.215:80 expires=1595143 Packets=0 Bytes=0 RxFlagsSeen=0x1b LastRxReport=1587142 TxFlagsSeen=0x00 LastTxReport=1587115 Flags=0x0051 [ RxClosing SeenNonSyn ProxyRedirect ] RevNAT=0 SourceSecurityID=44892 BackendID=0
TCP IN 10.244.0.215:80 -> 10.244.1.234:44892 expires=1595143 Packets=0 Bytes=0 RxFlagsSeen=0x12 LastRxReport=1587116 TxFlagsSeen=0x19 LastTxReport=1587142 Flags=0x0412 [ TxClosing SeenNonSyn FromTunnel ] RevNAT=0 SourceSecurityID=60249 BackendID=0

Requiring the ACK flag be cleared when seeing the SYN flag being set
prevents the second CT entry from being created.

Related: #32653
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
jrajahalme added a commit to jrajahalme/cilium that referenced this pull request Jul 9, 2025
[ upstream commit 92a3319 ]

Require TCP ACK flag to not be set when SYN is set to recreate a CT entry.

This addresses the problem where CT entry is created in the reply
direction without a proxy redirect flag when a forward direction CT entry
with proxy redirect flag already exists, when a packet from an Envoy
upstream connection destination reaches bpf_lxc of the source pod.

The CT proxy redirect flag exists for the purpose of routing reply
direction packets to the proxy when they reach the source pod's bpf_lxc
program. Recreting the CT entry on the basis of the TCP SYN flag without
requiring the ACK flag to be unset defeats this purpose and stalls
traffic on source pod/Envoy (downstream) connection.

Example of creation of CT entry in the reply direction (only showing
reply direction flows, SYN/ACK in the middle is for a proxy upstream
connection that needs to be redirected to the proxy instead of the source
pod):

-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0x61a4168c , identity 60249->44892 state new ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0x1fa7df2a , identity 60249->44892 state established ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK

CT entries, the second one being created in the reply direction:

TCP OUT 10.244.1.234:44892 -> 10.244.0.215:80 expires=1595143 Packets=0 Bytes=0 RxFlagsSeen=0x1b LastRxReport=1587142 TxFlagsSeen=0x00 LastTxReport=1587115 Flags=0x0051 [ RxClosing SeenNonSyn ProxyRedirect ] RevNAT=0 SourceSecurityID=44892 BackendID=0
TCP IN 10.244.0.215:80 -> 10.244.1.234:44892 expires=1595143 Packets=0 Bytes=0 RxFlagsSeen=0x12 LastRxReport=1587116 TxFlagsSeen=0x19 LastTxReport=1587142 Flags=0x0412 [ TxClosing SeenNonSyn FromTunnel ] RevNAT=0 SourceSecurityID=60249 BackendID=0

Requiring the ACK flag be cleared when seeing the SYN flag being set
prevents the second CT entry from being created.

Related: cilium#32653
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
jrajahalme added a commit to jrajahalme/cilium that referenced this pull request Jul 9, 2025
[ upstream commit 92a3319 ]

Require TCP ACK flag to not be set when SYN is set to recreate a CT entry.

This addresses the problem where CT entry is created in the reply
direction without a proxy redirect flag when a forward direction CT entry
with proxy redirect flag already exists, when a packet from an Envoy
upstream connection destination reaches bpf_lxc of the source pod.

The CT proxy redirect flag exists for the purpose of routing reply
direction packets to the proxy when they reach the source pod's bpf_lxc
program. Recreting the CT entry on the basis of the TCP SYN flag without
requiring the ACK flag to be unset defeats this purpose and stalls
traffic on source pod/Envoy (downstream) connection.

Example of creation of CT entry in the reply direction (only showing
reply direction flows, SYN/ACK in the middle is for a proxy upstream
connection that needs to be redirected to the proxy instead of the source
pod):

-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0x61a4168c , identity 60249->44892 state new ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0x1fa7df2a , identity 60249->44892 state established ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK

CT entries, the second one being created in the reply direction:

TCP OUT 10.244.1.234:44892 -> 10.244.0.215:80 expires=1595143 Packets=0 Bytes=0 RxFlagsSeen=0x1b LastRxReport=1587142 TxFlagsSeen=0x00 LastTxReport=1587115 Flags=0x0051 [ RxClosing SeenNonSyn ProxyRedirect ] RevNAT=0 SourceSecurityID=44892 BackendID=0
TCP IN 10.244.0.215:80 -> 10.244.1.234:44892 expires=1595143 Packets=0 Bytes=0 RxFlagsSeen=0x12 LastRxReport=1587116 TxFlagsSeen=0x19 LastTxReport=1587142 Flags=0x0412 [ TxClosing SeenNonSyn FromTunnel ] RevNAT=0 SourceSecurityID=60249 BackendID=0

Requiring the ACK flag be cleared when seeing the SYN flag being set
prevents the second CT entry from being created.

Related: cilium#32653
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
jrajahalme added a commit to jrajahalme/cilium that referenced this pull request Jul 9, 2025
[ upstream commit 92a3319 ]

Require TCP ACK flag to not be set when SYN is set to recreate a CT entry.

This addresses the problem where CT entry is created in the reply
direction without a proxy redirect flag when a forward direction CT entry
with proxy redirect flag already exists, when a packet from an Envoy
upstream connection destination reaches bpf_lxc of the source pod.

The CT proxy redirect flag exists for the purpose of routing reply
direction packets to the proxy when they reach the source pod's bpf_lxc
program. Recreting the CT entry on the basis of the TCP SYN flag without
requiring the ACK flag to be unset defeats this purpose and stalls
traffic on source pod/Envoy (downstream) connection.

Example of creation of CT entry in the reply direction (only showing
reply direction flows, SYN/ACK in the middle is for a proxy upstream
connection that needs to be redirected to the proxy instead of the source
pod):

-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0x61a4168c , identity 60249->44892 state new ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0x1fa7df2a , identity 60249->44892 state established ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK

CT entries, the second one being created in the reply direction:

TCP OUT 10.244.1.234:44892 -> 10.244.0.215:80 expires=1595143 Packets=0 Bytes=0 RxFlagsSeen=0x1b LastRxReport=1587142 TxFlagsSeen=0x00 LastTxReport=1587115 Flags=0x0051 [ RxClosing SeenNonSyn ProxyRedirect ] RevNAT=0 SourceSecurityID=44892 BackendID=0
TCP IN 10.244.0.215:80 -> 10.244.1.234:44892 expires=1595143 Packets=0 Bytes=0 RxFlagsSeen=0x12 LastRxReport=1587116 TxFlagsSeen=0x19 LastTxReport=1587142 Flags=0x0412 [ TxClosing SeenNonSyn FromTunnel ] RevNAT=0 SourceSecurityID=60249 BackendID=0

Requiring the ACK flag be cleared when seeing the SYN flag being set
prevents the second CT entry from being created.

Related: cilium#32653
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
github-merge-queue bot pushed a commit that referenced this pull request Jul 10, 2025
[ upstream commit 92a3319 ]

Require TCP ACK flag to not be set when SYN is set to recreate a CT entry.

This addresses the problem where CT entry is created in the reply
direction without a proxy redirect flag when a forward direction CT entry
with proxy redirect flag already exists, when a packet from an Envoy
upstream connection destination reaches bpf_lxc of the source pod.

The CT proxy redirect flag exists for the purpose of routing reply
direction packets to the proxy when they reach the source pod's bpf_lxc
program. Recreting the CT entry on the basis of the TCP SYN flag without
requiring the ACK flag to be unset defeats this purpose and stalls
traffic on source pod/Envoy (downstream) connection.

Example of creation of CT entry in the reply direction (only showing
reply direction flows, SYN/ACK in the middle is for a proxy upstream
connection that needs to be redirected to the proxy instead of the source
pod):

-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0x61a4168c , identity 60249->44892 state new ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0x1fa7df2a , identity 60249->44892 state established ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK

CT entries, the second one being created in the reply direction:

TCP OUT 10.244.1.234:44892 -> 10.244.0.215:80 expires=1595143 Packets=0 Bytes=0 RxFlagsSeen=0x1b LastRxReport=1587142 TxFlagsSeen=0x00 LastTxReport=1587115 Flags=0x0051 [ RxClosing SeenNonSyn ProxyRedirect ] RevNAT=0 SourceSecurityID=44892 BackendID=0
TCP IN 10.244.0.215:80 -> 10.244.1.234:44892 expires=1595143 Packets=0 Bytes=0 RxFlagsSeen=0x12 LastRxReport=1587116 TxFlagsSeen=0x19 LastTxReport=1587142 Flags=0x0412 [ TxClosing SeenNonSyn FromTunnel ] RevNAT=0 SourceSecurityID=60249 BackendID=0

Requiring the ACK flag be cleared when seeing the SYN flag being set
prevents the second CT entry from being created.

Related: #32653
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
github-merge-queue bot pushed a commit that referenced this pull request Jul 10, 2025
[ upstream commit 92a3319 ]

Require TCP ACK flag to not be set when SYN is set to recreate a CT entry.

This addresses the problem where CT entry is created in the reply
direction without a proxy redirect flag when a forward direction CT entry
with proxy redirect flag already exists, when a packet from an Envoy
upstream connection destination reaches bpf_lxc of the source pod.

The CT proxy redirect flag exists for the purpose of routing reply
direction packets to the proxy when they reach the source pod's bpf_lxc
program. Recreting the CT entry on the basis of the TCP SYN flag without
requiring the ACK flag to be unset defeats this purpose and stalls
traffic on source pod/Envoy (downstream) connection.

Example of creation of CT entry in the reply direction (only showing
reply direction flows, SYN/ACK in the middle is for a proxy upstream
connection that needs to be redirected to the proxy instead of the source
pod):

-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0x61a4168c , identity 60249->44892 state new ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0x1fa7df2a , identity 60249->44892 state established ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK

CT entries, the second one being created in the reply direction:

TCP OUT 10.244.1.234:44892 -> 10.244.0.215:80 expires=1595143 Packets=0 Bytes=0 RxFlagsSeen=0x1b LastRxReport=1587142 TxFlagsSeen=0x00 LastTxReport=1587115 Flags=0x0051 [ RxClosing SeenNonSyn ProxyRedirect ] RevNAT=0 SourceSecurityID=44892 BackendID=0
TCP IN 10.244.0.215:80 -> 10.244.1.234:44892 expires=1595143 Packets=0 Bytes=0 RxFlagsSeen=0x12 LastRxReport=1587116 TxFlagsSeen=0x19 LastTxReport=1587142 Flags=0x0412 [ TxClosing SeenNonSyn FromTunnel ] RevNAT=0 SourceSecurityID=60249 BackendID=0

Requiring the ACK flag be cleared when seeing the SYN flag being set
prevents the second CT entry from being created.

Related: #32653
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
asauber pushed a commit to jrajahalme/cilium that referenced this pull request Jul 14, 2025
[ upstream commit 92a3319 ]

Require TCP ACK flag to not be set when SYN is set to recreate a CT entry.

This addresses the problem where CT entry is created in the reply
direction without a proxy redirect flag when a forward direction CT entry
with proxy redirect flag already exists, when a packet from an Envoy
upstream connection destination reaches bpf_lxc of the source pod.

The CT proxy redirect flag exists for the purpose of routing reply
direction packets to the proxy when they reach the source pod's bpf_lxc
program. Recreting the CT entry on the basis of the TCP SYN flag without
requiring the ACK flag to be unset defeats this purpose and stalls
traffic on source pod/Envoy (downstream) connection.

Example of creation of CT entry in the reply direction (only showing
reply direction flows, SYN/ACK in the middle is for a proxy upstream
connection that needs to be redirected to the proxy instead of the source
pod):

-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0x61a4168c , identity 60249->44892 state new ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0x1fa7df2a , identity 60249->44892 state established ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK

CT entries, the second one being created in the reply direction:

TCP OUT 10.244.1.234:44892 -> 10.244.0.215:80 expires=1595143 Packets=0 Bytes=0 RxFlagsSeen=0x1b LastRxReport=1587142 TxFlagsSeen=0x00 LastTxReport=1587115 Flags=0x0051 [ RxClosing SeenNonSyn ProxyRedirect ] RevNAT=0 SourceSecurityID=44892 BackendID=0
TCP IN 10.244.0.215:80 -> 10.244.1.234:44892 expires=1595143 Packets=0 Bytes=0 RxFlagsSeen=0x12 LastRxReport=1587116 TxFlagsSeen=0x19 LastTxReport=1587142 Flags=0x0412 [ TxClosing SeenNonSyn FromTunnel ] RevNAT=0 SourceSecurityID=60249 BackendID=0

Requiring the ACK flag be cleared when seeing the SYN flag being set
prevents the second CT entry from being created.

Related: cilium#32653
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
github-merge-queue bot pushed a commit that referenced this pull request Jul 14, 2025
[ upstream commit 92a3319 ]

Require TCP ACK flag to not be set when SYN is set to recreate a CT entry.

This addresses the problem where CT entry is created in the reply
direction without a proxy redirect flag when a forward direction CT entry
with proxy redirect flag already exists, when a packet from an Envoy
upstream connection destination reaches bpf_lxc of the source pod.

The CT proxy redirect flag exists for the purpose of routing reply
direction packets to the proxy when they reach the source pod's bpf_lxc
program. Recreting the CT entry on the basis of the TCP SYN flag without
requiring the ACK flag to be unset defeats this purpose and stalls
traffic on source pod/Envoy (downstream) connection.

Example of creation of CT entry in the reply direction (only showing
reply direction flows, SYN/ACK in the middle is for a proxy upstream
connection that needs to be redirected to the proxy instead of the source
pod):

-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0x61a4168c , identity 60249->44892 state new ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0x1fa7df2a , identity 60249->44892 state established ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK

CT entries, the second one being created in the reply direction:

TCP OUT 10.244.1.234:44892 -> 10.244.0.215:80 expires=1595143 Packets=0 Bytes=0 RxFlagsSeen=0x1b LastRxReport=1587142 TxFlagsSeen=0x00 LastTxReport=1587115 Flags=0x0051 [ RxClosing SeenNonSyn ProxyRedirect ] RevNAT=0 SourceSecurityID=44892 BackendID=0
TCP IN 10.244.0.215:80 -> 10.244.1.234:44892 expires=1595143 Packets=0 Bytes=0 RxFlagsSeen=0x12 LastRxReport=1587116 TxFlagsSeen=0x19 LastTxReport=1587142 Flags=0x0412 [ TxClosing SeenNonSyn FromTunnel ] RevNAT=0 SourceSecurityID=60249 BackendID=0

Requiring the ACK flag be cleared when seeing the SYN flag being set
prevents the second CT entry from being created.

Related: #32653
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
rabelmervin pushed a commit to rabelmervin/cilium that referenced this pull request Aug 18, 2025
Require TCP ACK flag to not be set when SYN is set to recreate a CT entry.

This addresses the problem where CT entry is created in the reply
direction without a proxy redirect flag when a forward direction CT entry
with proxy redirect flag already exists, when a packet from an Envoy
upstream connection destination reaches bpf_lxc of the source pod.

The CT proxy redirect flag exists for the purpose of routing reply
direction packets to the proxy when they reach the source pod's bpf_lxc
program. Recreting the CT entry on the basis of the TCP SYN flag without
requiring the ACK flag to be unset defeats this purpose and stalls
traffic on source pod/Envoy (downstream) connection.

Example of creation of CT entry in the reply direction (only showing
reply direction flows, SYN/ACK in the middle is for a proxy upstream
connection that needs to be redirected to the proxy instead of the source
pod):

-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0x61a4168c , identity 60249->44892 state new ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK
-> endpoint 73 flow 0x1fa7df2a , identity 60249->44892 state established ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp SYN, ACK
-> endpoint 73 flow 0xba1ee241 , identity 60249->44892 state reply ifindex lxce2292d80c218 orig-ip 10.244.0.215: 10.244.0.215:80 -> 10.244.1.234:39194 tcp ACK

CT entries, the second one being created in the reply direction:

TCP OUT 10.244.1.234:44892 -> 10.244.0.215:80 expires=1595143 Packets=0 Bytes=0 RxFlagsSeen=0x1b LastRxReport=1587142 TxFlagsSeen=0x00 LastTxReport=1587115 Flags=0x0051 [ RxClosing SeenNonSyn ProxyRedirect ] RevNAT=0 SourceSecurityID=44892 BackendID=0
TCP IN 10.244.0.215:80 -> 10.244.1.234:44892 expires=1595143 Packets=0 Bytes=0 RxFlagsSeen=0x12 LastRxReport=1587116 TxFlagsSeen=0x19 LastTxReport=1587142 Flags=0x0412 [ TxClosing SeenNonSyn FromTunnel ] RevNAT=0 SourceSecurityID=60249 BackendID=0

Requiring the ACK flag be cleared when seeing the SYN flag being set
prevents the second CT entry from being created.

Related: cilium#32653
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/v1.14 This issue affects v1.14 branch affects/v1.15 This issue affects v1.15 branch area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. area/proxy Impacts proxy components, including DNS, Kafka, Envoy and/or XDS servers. feature/conntrack kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI: Conformance E2E: client-egress-l7-named-port/pod-to-pod: command terminated with exit code 28 (timeout)
4 participants