Skip to content

Conversation

lzang
Copy link
Contributor

@lzang lzang commented Sep 30, 2020

When a tcp SYN packet hits a conntrack entry that is in close-wait state,
the entry timeout will be modified and the entry will be reopened. But because the
return code is CT_ESTABLISHED, policy-verdict event will not be generated even this
is a new connection. The patch fix the issue by returning the code CT_REOPENED
instead in this case. Other than policy verdict events, all other handlings are the
same as if CT_ESTABLISHED is returned.

Signed-off-by: Zang Li zangli@google.com

Fixes: #13339

Fix missing policy-verdict event when a session is re-opened.

@lzang lzang requested a review from a team September 30, 2020 06:48
@lzang lzang requested a review from a team as a code owner September 30, 2020 06:48
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Sep 30, 2020
Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one minor comment below, but code LGTM otherwise (and includes a unit test!)

Note for reviewers: the checkpatch warning can be ignored.

@pchaigno pchaigno added area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. labels Sep 30, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Sep 30, 2020
@pchaigno pchaigno added needs-backport/1.8 area/monitor Impacts monitoring, access logging, flow logging, visibility of datapath traffic. labels Sep 30, 2020
@lzang
Copy link
Contributor Author

lzang commented Sep 30, 2020

Hi Paul and Maciej, I have another question: I would expect the timeout to be CT_SYN_TIMEOUT instead of CT_CONNECTION_LIFETIME_TCP for the reopened case. This can be fixed with the following change.
What do you think? If you agree, I can add the following change on top of the current commit.

--- a/bpf/lib/conntrack.h
+++ b/bpf/lib/conntrack.h
@@ -244,6 +244,7 @@ static __always_inline __u8 __ct_lookup(const void *map, struct __ctx_buff *ctx,
                        reopen |= seen_flags.value & TCP_FLAG_SYN;
                        if (unlikely(reopen == (TCP_FLAG_SYN|0x1))) {
                                ct_reset_closing(entry);
+                               entry->seen_non_syn = 0;
                                *monitor = ct_update_timeout(entry, is_tcp, dir, seen_flags);
                                return CT_REOPENED;
                        }

@joestringer
Copy link
Member

@lzang Regarding the timeout question, do we know for sure that such a connection will actually be new vs. a partially closed connection which the peers are reopening?

I wonder whether anyone has published some real-world observations or recommendations for handling TCP connection reopening, I've always seen it as something that is theoretically possible but am not familiar with common use cases or what kind of expectations apps might have in this scenario (and hence for instance how our selection of timeouts might impact such apps). It's easy to trigger such codepaths with a testing tool that just reuses ports, but in that case I think it's rare that there's any real impact from the timeouts we select since the test will often open & close the session within a short window.

@lzang
Copy link
Contributor Author

lzang commented Oct 1, 2020

@lzang Regarding the timeout question, do we know for sure that such a connection will actually be new vs. a partially closed connection which the peers are reopening?

The code path was triggered only on SYN request. What flag will be sent if it is a partially closed connection but the peers are reopening? Also the timeout would be updated on next packet that hits the same conntrack entry.

I wonder whether anyone has published some real-world observations or recommendations for handling TCP connection reopening, I've always seen it as something that is theoretically possible but am not familiar with common use cases or what kind of expectations apps might have in this scenario (and hence for instance how our selection of timeouts might impact such apps). It's easy to trigger such codepaths with a testing tool that just reuses ports, but in that case I think it's rare that there's any real impact from the timeouts we select since the test will often open & close the session within a short window.

I am not aware of any published work there. I think selection of timeout mainly affects how fast you can recycle the resource. But I agree the scenario here is a corner case and its impact to the real system should be small.

@lzang
Copy link
Contributor Author

lzang commented Oct 1, 2020

K8s-1.12-Kernel-netnext test fails with the following error - is it a test flakiness?

Stderr:
error: unable to upgrade connection: Authorization error (user=kube-apiserver-kubelet-client, verb=create, resource=nodes, subresource=proxy)

@pchaigno
Copy link
Member

pchaigno commented Oct 1, 2020

do we know for sure that such a connection will actually be new vs. a partially closed connection which the peers are reopening?

Is there such a thing as reopening a close-wait TCP connection? RFC793 mentions the following on reception of a SYN packet while in close-wait state (page 71):

If the SYN is in the window it is an error, send a reset, any
outstanding RECEIVEs and SEND should receive "reset" responses,
all segment queues should be flushed, the user should also
receive an unsolicited general "connection reset" signal, enter
the CLOSED state, delete the TCB, and return.

@joestringer
Copy link
Member

@pchaigno Hmm, I was thinking more about time-wait rather than close-wait, see RFC1122 Section 4.2.2.13:

When a connection is closed actively, it MUST linger in
TIME-WAIT state for a time 2xMSL (Maximum Segment Lifetime).
However, it MAY accept a new SYN from the remote TCP to
reopen the connection directly from TIME-WAIT state, if it:

       (1)  assigns its initial sequence number for the new
            connection to be larger than the largest sequence
            number it used on the previous connection incarnation,
            and

       (2)  returns to TIME-WAIT state if the SYN turns out to be
            an old duplicate.

@lzang
Copy link
Contributor Author

lzang commented Oct 1, 2020

Is there such a thing as reopening a close-wait TCP connection? RFC793 mentions the following on reception of a SYN packet while in close-wait state (page 71):

If the SYN is in the window it is an error, send a reset, any
outstanding RECEIVEs and SEND should receive "reset" responses,
all segment queues should be flushed, the user should also
receive an unsolicited general "connection reset" signal, enter
the CLOSED state, delete the TCB, and return.

I think we are not talking about the TCP layer here, and the re-open is just a re-use of the old entry from conntrack's view. The connection may have been considered as closed from TCP layer's view.

@lzang
Copy link
Contributor Author

lzang commented Oct 1, 2020

/test-me-please

@joestringer
Copy link
Member

I think we are not talking about the TCP layer here, and the re-open is just a re-use of the old entry from conntrack's view. The connection may have been considered as closed from TCP layer's view.

Before this PR, both connection reopens from time-wait and new connections reusing the same tuple would be considered as the existing connection and no policy verdict would be returned. So this is correct for the time-wait/re-open case but wrong for the tuple re-use case (which should be considered a new connection).

With the PR currently, both cases above would be considered new, which is correct for the tuple re-use case but could be considered wrong for the time-wait/reopen case since it's the same connection, just being reopened..

The only way to accurately determine when the connection is new or being reused for both cases AFAICT would be to also track sequence numbers, which is probably more work than we really care to do.

I think the argument of this PR is that overall, Cilium is more likely to send policy verdict notifications more accurately across these cases because (particularly in local testing), reuse of the tuple for a new connection is more likely than reopen of the existing time-wait connection? Or maybe it could be argued that even following the time-wait case above, in some sense it is triggering another "opening" of a connection and therefore we should log the policy verdict. I'm fine with either of these assessments.

Regarding the timeouts, my reading on the above time-wait would be that for that case, we should resume the regular connection timeouts. But as you say, the timeout would be updated on the next connection update anyway so it's probably not that consequential?

Given the comments above and inside __ct_update_timeout(), I would probably err on the side of not clearing the seen_non_syn bit unless we have some pretty clear guidance to describe why it's important or necessary.

@lzang
Copy link
Contributor Author

lzang commented Oct 1, 2020

Thanks Joe for the explanation! Now I got what you mean. Not sure how frequently would a tcp being re-opened with a SYN after entering the time-wait state, as most time time-wait is used to catch data packets that arrive later than the FIN. Also time-wait with 2xMSL is usually less than a couple of minutes, but the conntrack entry can stick for quite long in CiIlium due to the GC mechanism (please correct me if I am wrong), so the chance of hitting the tuple-reuse case could be larger. Agree that this is not a perfect solution, but it should work for most cases, and we probably don't need to be 100% accurate here :-)

When a tcp SYN packet hits a conntrack entry that is in close-wait state,
the entry timeout will be modified and the entry will be reopened. But
because the return code is CT_ESTABLISHED, policy-verdict event will not
be generated even this is a new connection. The patch fix the issue by
returning the code CT_REOPENED instead in this case. Except for policy
verdict events, all other handlings are the same as if CT_ESTABLISHED
is returned.

Signed-off-by: Zang Li <zangli@google.com>
@pchaigno
Copy link
Member

pchaigno commented Oct 6, 2020

CI is green except for checkpatch but we can ignore that particular warning (would require a lot more changes to fix).

@joestringer joestringer merged commit ada53a9 into cilium:master Oct 6, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Oct 6, 2020
jrajahalme added a commit to jrajahalme/cilium that referenced this pull request May 21, 2024
CT_REOPENED was originally added in
cilium#13340 to emit policy verdicts for
apparently re-opened TCP connections, which are in fact more likely to be
newly opened TCP connections rather than re-opened ones, as the CT
entries may live minutes after the TCP state from the endpoints has
already timed out.

This added the complexity of the call sites having to differentiate
between CT_NEW and CT_REOPENED, and failing to update various CT entry
field values in the process, 'proxy_redirect' being one of them.

Instead of adjusting each call site to behave properly for CT_REOPENED,
return CT_NEW instead, and make the observable CT lookup behavior the
same as for CT_NEW in that case, most notably by not updating the passed
in `*ct_state'.

This change fixes proxy redirection bug where return packets are not
redirected to an L7 proxy when (a stale) CT entry is missing the
'proxy_redirect' flag.

Fixes: cilium#27762
Fixes: cilium#13340
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
jrajahalme added a commit to jrajahalme/cilium that referenced this pull request May 21, 2024
CT_REOPENED was originally added in
cilium#13340 to emit policy verdicts for
apparently re-opened TCP connections, which are in fact more likely to be
newly opened TCP connections rather than re-opened ones, as the CT
entries may live minutes after the TCP state from the endpoints has
already timed out.

This added complexity to call sites, forcing differentiation between
CT_NEW and CT_REOPENED. In all cases some CT entry field values were left
stale, e.g., 'proxy_redirect' after a policy change.

Instead of adjusting each call site to behave properly for CT_REOPENED,
return CT_NEW instead, and make the observable CT lookup behavior the
same as for CT_NEW in that case, most notably by not updating the passed
in `*ct_state'.

This change fixes proxy redirection bug where return packets are not
redirected to an L7 proxy when (a stale) CT entry is missing the
'proxy_redirect' flag.

Fixes: cilium#27762
Fixes: cilium#13340
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
jrajahalme added a commit to jrajahalme/cilium that referenced this pull request May 28, 2024
CT_REOPENED was originally added in
cilium#13340 to emit policy verdicts for
apparently re-opened TCP connections, which are in fact more likely to be
newly opened TCP connections rather than re-opened ones, as the CT
entries may live minutes after the TCP state from the endpoints has
already timed out.

This added complexity to call sites, forcing differentiation between
CT_NEW and CT_REOPENED. In all cases some CT entry field values were left
stale, e.g., 'proxy_redirect' after a policy change.

Instead of adjusting each call site to behave properly for CT_REOPENED,
return CT_NEW instead, and make the observable CT lookup behavior the
same as for CT_NEW in that case, most notably by not updating the passed
in `*ct_state'.

This change fixes proxy redirection bug where return packets are not
redirected to an L7 proxy when (a stale) CT entry is missing the
'proxy_redirect' flag.

Fixes: cilium#27762
Fixes: cilium#13340
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
jrajahalme added a commit to jrajahalme/cilium that referenced this pull request Jun 5, 2024
CT_REOPENED was originally added in
cilium#13340 to emit policy verdicts for
apparently re-opened TCP connections, which are in fact more likely to be
newly opened TCP connections rather than re-opened ones, as the CT
entries may live minutes after the TCP state from the endpoints has
already timed out.

This added complexity to call sites, forcing differentiation between
CT_NEW and CT_REOPENED. In all cases some CT entry field values were left
stale, e.g., 'proxy_redirect' after a policy change.

Instead of adjusting each call site to behave properly for CT_REOPENED,
return CT_NEW instead, and make the observable CT lookup behavior the
same as for CT_NEW in that case, most notably by not updating the passed
in `*ct_state'.

This change fixes proxy redirection bug where return packets are not
redirected to an L7 proxy when (a stale) CT entry is missing the
'proxy_redirect' flag.

Fixes: cilium#27762
Fixes: cilium#13340
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
jrajahalme added a commit to jrajahalme/cilium that referenced this pull request Jun 5, 2024
CT_REOPENED was originally added in
cilium#13340 to emit policy verdicts for
apparently re-opened TCP connections, which are in fact more likely to be
newly opened TCP connections rather than re-opened ones, as the CT
entries may live minutes after the TCP state from the endpoints has
already timed out.

This added complexity to call sites, forcing differentiation between
CT_NEW and CT_REOPENED. In all cases some CT entry field values were left
stale, e.g., 'proxy_redirect' after a policy change.

Instead of adjusting each call site to behave properly for CT_REOPENED,
return CT_NEW instead, and make the observable CT lookup behavior the
same as for CT_NEW in that case, most notably by not updating the passed
in `*ct_state'.

This change fixes proxy redirection bug where return packets are not
redirected to an L7 proxy when (a stale) CT entry is missing the
'proxy_redirect' flag.

Fixes: cilium#27762
Fixes: cilium#13340
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
github-merge-queue bot pushed a commit that referenced this pull request Jun 6, 2024
CT_REOPENED was originally added in
#13340 to emit policy verdicts for
apparently re-opened TCP connections, which are in fact more likely to be
newly opened TCP connections rather than re-opened ones, as the CT
entries may live minutes after the TCP state from the endpoints has
already timed out.

This added complexity to call sites, forcing differentiation between
CT_NEW and CT_REOPENED. In all cases some CT entry field values were left
stale, e.g., 'proxy_redirect' after a policy change.

Instead of adjusting each call site to behave properly for CT_REOPENED,
return CT_NEW instead, and make the observable CT lookup behavior the
same as for CT_NEW in that case, most notably by not updating the passed
in `*ct_state'.

This change fixes proxy redirection bug where return packets are not
redirected to an L7 proxy when (a stale) CT entry is missing the
'proxy_redirect' flag.

Fixes: #27762
Fixes: #13340
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. area/monitor Impacts monitoring, access logging, flow logging, visibility of datapath traffic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Not all policy verdict events are reported in load test
5 participants