Skip to content

cilium-cli: extend no-interrupted-connections to test Egress Gateway #38193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 21, 2025

Conversation

ysksuzuki
Copy link
Member

This PR extends the conn-disrupt-test to ensure that connections through a Egress Gateway are not disrupted.

The test deploys a conn-disrupt-test server with a single replica on an external node, a client on a gateway node and another client on a non gateway node. Clients establish long-lived TCP connections with the server through the gateway node.

After upgrade, the test checks the restart counters, and compares them with the counters stored before the upgrade. A mismatch indicates that a connection was interrupted.

Example output from kubectl -n cilium-test-1 get po -o wide:

NAME                                                        READY   STATUS    RESTARTS      AGE   IP             NODE                 NOMINATED NODE   READINESS GATES
test-conn-disrupt-client-egw-gw-node-859d6b6875-vkvh2       1/1     Running   0             15s   10.244.3.206   kind-worker          <none>           <none>
test-conn-disrupt-client-egw-non-gw-node-6c69d885df-7z6c5   1/1     Running   0             15s   10.244.1.14    kind-worker2         <none>           <none>
test-conn-disrupt-server-egw-644f65cb-vz8v7                 1/1     Running   0             17s   172.18.0.4     kind-worker3         <none>           <none>

Fixes: #37092

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Mar 14, 2025
@github-actions github-actions bot added cilium-cli This PR contains changes related with cilium-cli cilium-cli-exclusive This PR only impacts cilium-cli binary labels Mar 14, 2025
@ysksuzuki ysksuzuki added the release-note/ci This PR makes changes to the CI. label Mar 14, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Mar 14, 2025
@ysksuzuki ysksuzuki force-pushed the pr/ysksuzuki/conn-disrupt-test-egw branch from 83134b8 to 905ea00 Compare March 14, 2025 07:40
@ysksuzuki
Copy link
Member Author

/test

@ysksuzuki ysksuzuki force-pushed the pr/ysksuzuki/conn-disrupt-test-egw branch from 905ea00 to ae37941 Compare March 14, 2025 09:24
@ysksuzuki
Copy link
Member Author

/test

@ysksuzuki ysksuzuki force-pushed the pr/ysksuzuki/conn-disrupt-test-egw branch from ae37941 to 92d57b7 Compare March 17, 2025 02:56
@ysksuzuki
Copy link
Member Author

/test

@ysksuzuki ysksuzuki marked this pull request as ready for review March 17, 2025 04:56
@ysksuzuki ysksuzuki requested review from a team as code owners March 17, 2025 04:56
Copy link
Member

@giorio94 giorio94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! A couple of minor comments inline.

@asauber
Copy link
Member

asauber commented Mar 17, 2025

The new test is being skipped in CI. I would prefer to see a run where it passes before approving.

https://github.com/cilium/cilium/actions/runs/13890711080/job/38862132656#step:27:258

@ysksuzuki ysksuzuki force-pushed the pr/ysksuzuki/conn-disrupt-test-egw branch 2 times, most recently from 3f81844 to 8c89991 Compare March 19, 2025 14:51
@ysksuzuki
Copy link
Member Author

/test

This commit extends the conn-disrupt-test to ensure that connections
through a Egress Gateway are not disrupted.

The test deploys a conn-disrupt-test server with a single replica
on an external node, a client on a gateway node and another client
on a non gateway node. Clients establish long-lived TCP connections
with the server through the gateway node.

After upgrade, the test checks the restart counters, and compares them
with the counters stored before the upgrade. A mismatch indicates that
a connection was interrupted.

Example output from `kubectl -n cilium-test-1 get po -o wide`:
```
NAME                                                        READY   STATUS    RESTARTS      AGE   IP             NODE                 NOMINATED NODE   READINESS GATES
test-conn-disrupt-client-egw-gw-node-859d6b6875-vkvh2       1/1     Running   0             15s   10.244.3.206   kind-worker          <none>           <none>
test-conn-disrupt-client-egw-non-gw-node-6c69d885df-7z6c5   1/1     Running   0             15s   10.244.1.14    kind-worker2         <none>           <none>
test-conn-disrupt-server-egw-644f65cb-vz8v7                 1/1     Running   0             17s   172.18.0.4     kind-worker3         <none>           <none>
```

Signed-off-by: Yusuke Suzuki <yusuke.suzuki@isovalent.com>
This commit enables conn-disrupt-test for Egress Gateway.

Signed-off-by: Yusuke Suzuki <yusuke.suzuki@isovalent.com>
Restarting the agent disrupts the connection through the EGW
when lb-acceleration is enabled due to #37431
Let's skip it for now

Signed-off-by: Yusuke Suzuki <yusuke.suzuki@isovalent.com>
Currently, the cross-cluster EGW policies are not supported.
Let's skip it for now.

Signed-off-by: Yusuke Suzuki <yusuke.suzuki@isovalent.com>
@ysksuzuki ysksuzuki force-pushed the pr/ysksuzuki/conn-disrupt-test-egw branch from 8c89991 to 656594e Compare March 20, 2025 00:58
@ysksuzuki
Copy link
Member Author

/test

@ysksuzuki
Copy link
Member Author

@pippolo84 Gentle ping for review🙏

@pippolo84
Copy link
Member

@pippolo84 Gentle ping for review🙏

Sorry for the delay, I'm gonna review this tomorrow 👍

Copy link
Member

@pippolo84 pippolo84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! 🚀

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 21, 2025
@julianwiedmann julianwiedmann added upgrade-impact This PR has potential upgrade or downgrade impact. feature/egress-gateway Impacts the egress IP gateway feature. labels Mar 21, 2025
@julianwiedmann julianwiedmann added this pull request to the merge queue Mar 21, 2025
Merged via the queue into main with commit 28f37b7 Mar 21, 2025
287 of 288 checks passed
@julianwiedmann julianwiedmann deleted the pr/ysksuzuki/conn-disrupt-test-egw branch March 21, 2025 10:33
@julianwiedmann julianwiedmann added the backport-pending/1.17 The backport for Cilium 1.17.x for this PR is in progress. label Mar 26, 2025
@julianwiedmann julianwiedmann added backport/author The backport will be carried out by the author of the PR. needs-backport/1.16 This PR / issue needs backporting to the v1.16 branch labels Apr 3, 2025
@julianwiedmann julianwiedmann removed the needs-backport/1.16 This PR / issue needs backporting to the v1.16 branch label Apr 8, 2025
@github-actions github-actions bot added backport-done/1.17 The backport for Cilium 1.17.x for this PR is done. and removed backport-pending/1.17 The backport for Cilium 1.17.x for this PR is in progress. labels Apr 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/author The backport will be carried out by the author of the PR. backport-done/1.17 The backport for Cilium 1.17.x for this PR is done. cilium-cli This PR contains changes related with cilium-cli cilium-cli-exclusive This PR only impacts cilium-cli binary feature/egress-gateway Impacts the egress IP gateway feature. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/ci This PR makes changes to the CI. upgrade-impact This PR has potential upgrade or downgrade impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

egressgw: test for disruptions during upgrade
7 participants