-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Add the return code CT_REOPENED for conntrack table lookup. #13340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one minor comment below, but code LGTM otherwise (and includes a unit test!)
Note for reviewers: the checkpatch warning can be ignored.
Hi Paul and Maciej, I have another question: I would expect the timeout to be CT_SYN_TIMEOUT instead of CT_CONNECTION_LIFETIME_TCP for the reopened case. This can be fixed with the following change.
|
@lzang Regarding the timeout question, do we know for sure that such a connection will actually be new vs. a partially closed connection which the peers are reopening? I wonder whether anyone has published some real-world observations or recommendations for handling TCP connection reopening, I've always seen it as something that is theoretically possible but am not familiar with common use cases or what kind of expectations apps might have in this scenario (and hence for instance how our selection of timeouts might impact such apps). It's easy to trigger such codepaths with a testing tool that just reuses ports, but in that case I think it's rare that there's any real impact from the timeouts we select since the test will often open & close the session within a short window. |
The code path was triggered only on SYN request. What flag will be sent if it is a partially closed connection but the peers are reopening? Also the timeout would be updated on next packet that hits the same conntrack entry.
I am not aware of any published work there. I think selection of timeout mainly affects how fast you can recycle the resource. But I agree the scenario here is a corner case and its impact to the real system should be small. |
K8s-1.12-Kernel-netnext test fails with the following error - is it a test flakiness? Stderr: |
Is there such a thing as reopening a close-wait TCP connection? RFC793 mentions the following on reception of a SYN packet while in close-wait state (page 71):
|
@pchaigno Hmm, I was thinking more about time-wait rather than close-wait, see RFC1122 Section 4.2.2.13:
|
I think we are not talking about the TCP layer here, and the re-open is just a re-use of the old entry from conntrack's view. The connection may have been considered as closed from TCP layer's view. |
/test-me-please |
Before this PR, both connection reopens from time-wait and new connections reusing the same tuple would be considered as the existing connection and no policy verdict would be returned. So this is correct for the time-wait/re-open case but wrong for the tuple re-use case (which should be considered a new connection). With the PR currently, both cases above would be considered new, which is correct for the tuple re-use case but could be considered wrong for the time-wait/reopen case since it's the same connection, just being reopened.. The only way to accurately determine when the connection is new or being reused for both cases AFAICT would be to also track sequence numbers, which is probably more work than we really care to do. I think the argument of this PR is that overall, Cilium is more likely to send policy verdict notifications more accurately across these cases because (particularly in local testing), reuse of the tuple for a new connection is more likely than reopen of the existing time-wait connection? Or maybe it could be argued that even following the time-wait case above, in some sense it is triggering another "opening" of a connection and therefore we should log the policy verdict. I'm fine with either of these assessments. Regarding the timeouts, my reading on the above time-wait would be that for that case, we should resume the regular connection timeouts. But as you say, the timeout would be updated on the next connection update anyway so it's probably not that consequential? Given the comments above and inside |
Thanks Joe for the explanation! Now I got what you mean. Not sure how frequently would a tcp being re-opened with a SYN after entering the time-wait state, as most time time-wait is used to catch data packets that arrive later than the FIN. Also time-wait with 2xMSL is usually less than a couple of minutes, but the conntrack entry can stick for quite long in CiIlium due to the GC mechanism (please correct me if I am wrong), so the chance of hitting the tuple-reuse case could be larger. Agree that this is not a perfect solution, but it should work for most cases, and we probably don't need to be 100% accurate here :-) |
When a tcp SYN packet hits a conntrack entry that is in close-wait state, the entry timeout will be modified and the entry will be reopened. But because the return code is CT_ESTABLISHED, policy-verdict event will not be generated even this is a new connection. The patch fix the issue by returning the code CT_REOPENED instead in this case. Except for policy verdict events, all other handlings are the same as if CT_ESTABLISHED is returned. Signed-off-by: Zang Li <zangli@google.com>
CI is green except for checkpatch but we can ignore that particular warning (would require a lot more changes to fix). |
CT_REOPENED was originally added in cilium#13340 to emit policy verdicts for apparently re-opened TCP connections, which are in fact more likely to be newly opened TCP connections rather than re-opened ones, as the CT entries may live minutes after the TCP state from the endpoints has already timed out. This added the complexity of the call sites having to differentiate between CT_NEW and CT_REOPENED, and failing to update various CT entry field values in the process, 'proxy_redirect' being one of them. Instead of adjusting each call site to behave properly for CT_REOPENED, return CT_NEW instead, and make the observable CT lookup behavior the same as for CT_NEW in that case, most notably by not updating the passed in `*ct_state'. This change fixes proxy redirection bug where return packets are not redirected to an L7 proxy when (a stale) CT entry is missing the 'proxy_redirect' flag. Fixes: cilium#27762 Fixes: cilium#13340 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
CT_REOPENED was originally added in cilium#13340 to emit policy verdicts for apparently re-opened TCP connections, which are in fact more likely to be newly opened TCP connections rather than re-opened ones, as the CT entries may live minutes after the TCP state from the endpoints has already timed out. This added complexity to call sites, forcing differentiation between CT_NEW and CT_REOPENED. In all cases some CT entry field values were left stale, e.g., 'proxy_redirect' after a policy change. Instead of adjusting each call site to behave properly for CT_REOPENED, return CT_NEW instead, and make the observable CT lookup behavior the same as for CT_NEW in that case, most notably by not updating the passed in `*ct_state'. This change fixes proxy redirection bug where return packets are not redirected to an L7 proxy when (a stale) CT entry is missing the 'proxy_redirect' flag. Fixes: cilium#27762 Fixes: cilium#13340 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
CT_REOPENED was originally added in cilium#13340 to emit policy verdicts for apparently re-opened TCP connections, which are in fact more likely to be newly opened TCP connections rather than re-opened ones, as the CT entries may live minutes after the TCP state from the endpoints has already timed out. This added complexity to call sites, forcing differentiation between CT_NEW and CT_REOPENED. In all cases some CT entry field values were left stale, e.g., 'proxy_redirect' after a policy change. Instead of adjusting each call site to behave properly for CT_REOPENED, return CT_NEW instead, and make the observable CT lookup behavior the same as for CT_NEW in that case, most notably by not updating the passed in `*ct_state'. This change fixes proxy redirection bug where return packets are not redirected to an L7 proxy when (a stale) CT entry is missing the 'proxy_redirect' flag. Fixes: cilium#27762 Fixes: cilium#13340 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
CT_REOPENED was originally added in cilium#13340 to emit policy verdicts for apparently re-opened TCP connections, which are in fact more likely to be newly opened TCP connections rather than re-opened ones, as the CT entries may live minutes after the TCP state from the endpoints has already timed out. This added complexity to call sites, forcing differentiation between CT_NEW and CT_REOPENED. In all cases some CT entry field values were left stale, e.g., 'proxy_redirect' after a policy change. Instead of adjusting each call site to behave properly for CT_REOPENED, return CT_NEW instead, and make the observable CT lookup behavior the same as for CT_NEW in that case, most notably by not updating the passed in `*ct_state'. This change fixes proxy redirection bug where return packets are not redirected to an L7 proxy when (a stale) CT entry is missing the 'proxy_redirect' flag. Fixes: cilium#27762 Fixes: cilium#13340 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
CT_REOPENED was originally added in cilium#13340 to emit policy verdicts for apparently re-opened TCP connections, which are in fact more likely to be newly opened TCP connections rather than re-opened ones, as the CT entries may live minutes after the TCP state from the endpoints has already timed out. This added complexity to call sites, forcing differentiation between CT_NEW and CT_REOPENED. In all cases some CT entry field values were left stale, e.g., 'proxy_redirect' after a policy change. Instead of adjusting each call site to behave properly for CT_REOPENED, return CT_NEW instead, and make the observable CT lookup behavior the same as for CT_NEW in that case, most notably by not updating the passed in `*ct_state'. This change fixes proxy redirection bug where return packets are not redirected to an L7 proxy when (a stale) CT entry is missing the 'proxy_redirect' flag. Fixes: cilium#27762 Fixes: cilium#13340 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
CT_REOPENED was originally added in #13340 to emit policy verdicts for apparently re-opened TCP connections, which are in fact more likely to be newly opened TCP connections rather than re-opened ones, as the CT entries may live minutes after the TCP state from the endpoints has already timed out. This added complexity to call sites, forcing differentiation between CT_NEW and CT_REOPENED. In all cases some CT entry field values were left stale, e.g., 'proxy_redirect' after a policy change. Instead of adjusting each call site to behave properly for CT_REOPENED, return CT_NEW instead, and make the observable CT lookup behavior the same as for CT_NEW in that case, most notably by not updating the passed in `*ct_state'. This change fixes proxy redirection bug where return packets are not redirected to an L7 proxy when (a stale) CT entry is missing the 'proxy_redirect' flag. Fixes: #27762 Fixes: #13340 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
When a tcp SYN packet hits a conntrack entry that is in close-wait state,
the entry timeout will be modified and the entry will be reopened. But because the
return code is CT_ESTABLISHED, policy-verdict event will not be generated even this
is a new connection. The patch fix the issue by returning the code CT_REOPENED
instead in this case. Other than policy verdict events, all other handlings are the
same as if CT_ESTABLISHED is returned.
Signed-off-by: Zang Li zangli@google.com
Fixes: #13339