Add the return code CT_REOPENED for conntrack table lookup. #13340

lzang · 2020-09-30T06:48:30Z

When a tcp SYN packet hits a conntrack entry that is in close-wait state,
the entry timeout will be modified and the entry will be reopened. But because the
return code is CT_ESTABLISHED, policy-verdict event will not be generated even this
is a new connection. The patch fix the issue by returning the code CT_REOPENED
instead in this case. Other than policy verdict events, all other handlings are the
same as if CT_ESTABLISHED is returned.

Signed-off-by: Zang Li zangli@google.com

Fixes: #13339

Fix missing policy-verdict event when a session is re-opened.

pchaigno

I have one minor comment below, but code LGTM otherwise (and includes a unit test!)

Note for reviewers: the checkpatch warning can be ignored.

bpf/bpf_lxc.c

lzang · 2020-09-30T22:18:31Z

Hi Paul and Maciej, I have another question: I would expect the timeout to be CT_SYN_TIMEOUT instead of CT_CONNECTION_LIFETIME_TCP for the reopened case. This can be fixed with the following change.
What do you think? If you agree, I can add the following change on top of the current commit.

--- a/bpf/lib/conntrack.h
+++ b/bpf/lib/conntrack.h
@@ -244,6 +244,7 @@ static __always_inline __u8 __ct_lookup(const void *map, struct __ctx_buff *ctx,
                        reopen |= seen_flags.value & TCP_FLAG_SYN;
                        if (unlikely(reopen == (TCP_FLAG_SYN|0x1))) {
                                ct_reset_closing(entry);
+                               entry->seen_non_syn = 0;
                                *monitor = ct_update_timeout(entry, is_tcp, dir, seen_flags);
                                return CT_REOPENED;
                        }

joestringer · 2020-09-30T23:47:08Z

@lzang Regarding the timeout question, do we know for sure that such a connection will actually be new vs. a partially closed connection which the peers are reopening?

I wonder whether anyone has published some real-world observations or recommendations for handling TCP connection reopening, I've always seen it as something that is theoretically possible but am not familiar with common use cases or what kind of expectations apps might have in this scenario (and hence for instance how our selection of timeouts might impact such apps). It's easy to trigger such codepaths with a testing tool that just reuses ports, but in that case I think it's rare that there's any real impact from the timeouts we select since the test will often open & close the session within a short window.

lzang · 2020-10-01T03:58:28Z

@lzang Regarding the timeout question, do we know for sure that such a connection will actually be new vs. a partially closed connection which the peers are reopening?

The code path was triggered only on SYN request. What flag will be sent if it is a partially closed connection but the peers are reopening? Also the timeout would be updated on next packet that hits the same conntrack entry.

I wonder whether anyone has published some real-world observations or recommendations for handling TCP connection reopening, I've always seen it as something that is theoretically possible but am not familiar with common use cases or what kind of expectations apps might have in this scenario (and hence for instance how our selection of timeouts might impact such apps). It's easy to trigger such codepaths with a testing tool that just reuses ports, but in that case I think it's rare that there's any real impact from the timeouts we select since the test will often open & close the session within a short window.

I am not aware of any published work there. I think selection of timeout mainly affects how fast you can recycle the resource. But I agree the scenario here is a corner case and its impact to the real system should be small.

lzang · 2020-10-01T04:03:43Z

K8s-1.12-Kernel-netnext test fails with the following error - is it a test flakiness?

Stderr:
error: unable to upgrade connection: Authorization error (user=kube-apiserver-kubelet-client, verb=create, resource=nodes, subresource=proxy)

pchaigno · 2020-10-01T06:46:19Z

do we know for sure that such a connection will actually be new vs. a partially closed connection which the peers are reopening?

Is there such a thing as reopening a close-wait TCP connection? RFC793 mentions the following on reception of a SYN packet while in close-wait state (page 71):

If the SYN is in the window it is an error, send a reset, any
outstanding RECEIVEs and SEND should receive "reset" responses,
all segment queues should be flushed, the user should also
receive an unsolicited general "connection reset" signal, enter
the CLOSED state, delete the TCB, and return.

joestringer · 2020-10-01T19:02:58Z

@pchaigno Hmm, I was thinking more about time-wait rather than close-wait, see RFC1122 Section 4.2.2.13:

When a connection is closed actively, it MUST linger in
TIME-WAIT state for a time 2xMSL (Maximum Segment Lifetime).
However, it MAY accept a new SYN from the remote TCP to
reopen the connection directly from TIME-WAIT state, if it:
       (1)  assigns its initial sequence number for the new
            connection to be larger than the largest sequence
            number it used on the previous connection incarnation,
            and

       (2)  returns to TIME-WAIT state if the SYN turns out to be
            an old duplicate.

lzang · 2020-10-01T19:06:18Z

Is there such a thing as reopening a close-wait TCP connection? RFC793 mentions the following on reception of a SYN packet while in close-wait state (page 71):

If the SYN is in the window it is an error, send a reset, any
outstanding RECEIVEs and SEND should receive "reset" responses,
all segment queues should be flushed, the user should also
receive an unsolicited general "connection reset" signal, enter
the CLOSED state, delete the TCB, and return.

I think we are not talking about the TCP layer here, and the re-open is just a re-use of the old entry from conntrack's view. The connection may have been considered as closed from TCP layer's view.

lzang · 2020-10-01T19:06:54Z

/test-me-please

joestringer · 2020-10-01T20:02:49Z

I think we are not talking about the TCP layer here, and the re-open is just a re-use of the old entry from conntrack's view. The connection may have been considered as closed from TCP layer's view.

Before this PR, both connection reopens from time-wait and new connections reusing the same tuple would be considered as the existing connection and no policy verdict would be returned. So this is correct for the time-wait/re-open case but wrong for the tuple re-use case (which should be considered a new connection).

With the PR currently, both cases above would be considered new, which is correct for the tuple re-use case but could be considered wrong for the time-wait/reopen case since it's the same connection, just being reopened..

The only way to accurately determine when the connection is new or being reused for both cases AFAICT would be to also track sequence numbers, which is probably more work than we really care to do.

I think the argument of this PR is that overall, Cilium is more likely to send policy verdict notifications more accurately across these cases because (particularly in local testing), reuse of the tuple for a new connection is more likely than reopen of the existing time-wait connection? Or maybe it could be argued that even following the time-wait case above, in some sense it is triggering another "opening" of a connection and therefore we should log the policy verdict. I'm fine with either of these assessments.

Regarding the timeouts, my reading on the above time-wait would be that for that case, we should resume the regular connection timeouts. But as you say, the timeout would be updated on the next connection update anyway so it's probably not that consequential?

Given the comments above and inside __ct_update_timeout(), I would probably err on the side of not clearing the seen_non_syn bit unless we have some pretty clear guidance to describe why it's important or necessary.

lzang · 2020-10-01T21:46:06Z

Thanks Joe for the explanation! Now I got what you mean. Not sure how frequently would a tcp being re-opened with a SYN after entering the time-wait state, as most time time-wait is used to catch data packets that arrive later than the FIN. Also time-wait with 2xMSL is usually less than a couple of minutes, but the conntrack entry can stick for quite long in CiIlium due to the GC mechanism (please correct me if I am wrong), so the chance of hitting the tuple-reuse case could be larger. Agree that this is not a perfect solution, but it should work for most cases, and we probably don't need to be 100% accurate here :-)

When a tcp SYN packet hits a conntrack entry that is in close-wait state, the entry timeout will be modified and the entry will be reopened. But because the return code is CT_ESTABLISHED, policy-verdict event will not be generated even this is a new connection. The patch fix the issue by returning the code CT_REOPENED instead in this case. Except for policy verdict events, all other handlings are the same as if CT_ESTABLISHED is returned. Signed-off-by: Zang Li <zangli@google.com>

pchaigno · 2020-10-06T17:28:50Z

CI is green except for checkpatch but we can ignore that particular warning (would require a lot more changes to fix).

CT_REOPENED was originally added in cilium#13340 to emit policy verdicts for apparently re-opened TCP connections, which are in fact more likely to be newly opened TCP connections rather than re-opened ones, as the CT entries may live minutes after the TCP state from the endpoints has already timed out. This added the complexity of the call sites having to differentiate between CT_NEW and CT_REOPENED, and failing to update various CT entry field values in the process, 'proxy_redirect' being one of them. Instead of adjusting each call site to behave properly for CT_REOPENED, return CT_NEW instead, and make the observable CT lookup behavior the same as for CT_NEW in that case, most notably by not updating the passed in `*ct_state'. This change fixes proxy redirection bug where return packets are not redirected to an L7 proxy when (a stale) CT entry is missing the 'proxy_redirect' flag. Fixes: cilium#27762 Fixes: cilium#13340 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>

CT_REOPENED was originally added in cilium#13340 to emit policy verdicts for apparently re-opened TCP connections, which are in fact more likely to be newly opened TCP connections rather than re-opened ones, as the CT entries may live minutes after the TCP state from the endpoints has already timed out. This added complexity to call sites, forcing differentiation between CT_NEW and CT_REOPENED. In all cases some CT entry field values were left stale, e.g., 'proxy_redirect' after a policy change. Instead of adjusting each call site to behave properly for CT_REOPENED, return CT_NEW instead, and make the observable CT lookup behavior the same as for CT_NEW in that case, most notably by not updating the passed in `*ct_state'. This change fixes proxy redirection bug where return packets are not redirected to an L7 proxy when (a stale) CT entry is missing the 'proxy_redirect' flag. Fixes: cilium#27762 Fixes: cilium#13340 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>

CT_REOPENED was originally added in #13340 to emit policy verdicts for apparently re-opened TCP connections, which are in fact more likely to be newly opened TCP connections rather than re-opened ones, as the CT entries may live minutes after the TCP state from the endpoints has already timed out. This added complexity to call sites, forcing differentiation between CT_NEW and CT_REOPENED. In all cases some CT entry field values were left stale, e.g., 'proxy_redirect' after a policy change. Instead of adjusting each call site to behave properly for CT_REOPENED, return CT_NEW instead, and make the observable CT lookup behavior the same as for CT_NEW in that case, most notably by not updating the passed in `*ct_state'. This change fixes proxy redirection bug where return packets are not redirected to an L7 proxy when (a stale) CT entry is missing the 'proxy_redirect' flag. Fixes: #27762 Fixes: #13340 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>

lzang requested a review from a team September 30, 2020 06:48

lzang requested a review from a team as a code owner September 30, 2020 06:48

maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Sep 30, 2020

lzang force-pushed the ct branch from 99f5add to d746887 Compare September 30, 2020 07:09

pchaigno approved these changes Sep 30, 2020

View reviewed changes

bpf/bpf_lxc.c Show resolved Hide resolved

pchaigno added area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. labels Sep 30, 2020

maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Sep 30, 2020

pchaigno added needs-backport/1.8 area/monitor Impacts monitoring, access logging, flow logging, visibility of datapath traffic. labels Sep 30, 2020

nebril approved these changes Sep 30, 2020

View reviewed changes

lzang force-pushed the ct branch from d746887 to 4baea00 Compare October 2, 2020 18:06

joestringer merged commit ada53a9 into cilium:master Oct 6, 2020

maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Oct 6, 2020

qmonnet mentioned this pull request Oct 7, 2020

v1.8 backports 2020-10-07 #13438

Merged

qmonnet added backport-pending/1.8 and removed needs-backport/1.8 labels Oct 7, 2020

qmonnet added the backport-done/1.8 label Oct 9, 2020

christarazi mentioned this pull request Oct 28, 2020

Prepare for release v1.8.5 #13803

Merged

jrajahalme mentioned this pull request May 21, 2024

bpf: Reset CT entry for CT_REOPENED #32614

Closed

jrajahalme mentioned this pull request May 21, 2024

bpf: Remove CT_REOPENED, return CT_NEW instead #32653

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add the return code CT_REOPENED for conntrack table lookup. #13340

Add the return code CT_REOPENED for conntrack table lookup. #13340

Uh oh!

lzang commented Sep 30, 2020

Uh oh!

pchaigno left a comment

Uh oh!

Uh oh!

lzang commented Sep 30, 2020 •

edited

Loading

Uh oh!

joestringer commented Sep 30, 2020

Uh oh!

lzang commented Oct 1, 2020

Uh oh!

lzang commented Oct 1, 2020

Uh oh!

pchaigno commented Oct 1, 2020

Uh oh!

joestringer commented Oct 1, 2020

Uh oh!

lzang commented Oct 1, 2020

Uh oh!

lzang commented Oct 1, 2020

Uh oh!

joestringer commented Oct 1, 2020

Uh oh!

lzang commented Oct 1, 2020 •

edited

Loading

Uh oh!

pchaigno commented Oct 6, 2020

Uh oh!

Uh oh!

Add the return code CT_REOPENED for conntrack table lookup. #13340

Add the return code CT_REOPENED for conntrack table lookup. #13340

Uh oh!

Conversation

lzang commented Sep 30, 2020

Uh oh!

pchaigno left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lzang commented Sep 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joestringer commented Sep 30, 2020

Uh oh!

lzang commented Oct 1, 2020

Uh oh!

lzang commented Oct 1, 2020

Uh oh!

pchaigno commented Oct 1, 2020

Uh oh!

joestringer commented Oct 1, 2020

Uh oh!

lzang commented Oct 1, 2020

Uh oh!

lzang commented Oct 1, 2020

Uh oh!

joestringer commented Oct 1, 2020

Uh oh!

lzang commented Oct 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pchaigno commented Oct 6, 2020

Uh oh!

Uh oh!

lzang commented Sep 30, 2020 •

edited

Loading

lzang commented Oct 1, 2020 •

edited

Loading