-
Notifications
You must be signed in to change notification settings - Fork 3.4k
cilium-cli: extend no-interrupted-connections to test NodePort from outside #37294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/test |
b9785bf
to
341684d
Compare
/test |
Fly-by comment, but I see this is making changes to test "from outside" which can often be a source of unreliable test results in CI runs (eg, passes sometimes but fails sometimes). I think it would be worthwhile to heavily stress these changes by running CI many times in order to demonstrate that the test changes have high reliability. |
0c0ea09
to
620378f
Compare
/test |
Looks like ci-e2e-upgrade failed due to a flake. opened #37325 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Some nits.
…utside This commit extends the conn-disrupt-test to ensure that external connections through a NodePort are not disrupted. The test deploys a conn-disrupt-test server with a single replica in the cluster and multiple clients on an external node. Clients establish long-lived TCP connections with the server using a NodePort. Each client selects a node where the backend is running, and one where the backend is not running. The clients send requests to both nodes using IPv4 and IPv6 addresses. After upgrade, the test checks the restart counters, and compares them with the counters stored before the upgrade. A mismatch indicates that a connection was interrupted. Example output from `kubectl -n cilium-test-1 get po -o wide`: ``` NAME READY STATUS RESTARTS AGE IP NODE test-conn-disrupt-client-backend-node-ipv4-internalip-7fc97pbp9 1/1 Running 0 34s 172.18.0.2 kind-worker3 test-conn-disrupt-client-backend-node-ipv6-internalip-66bdtmg2z 1/1 Running 0 34s 172.18.0.2 kind-worker3 test-conn-disrupt-client-non-backend-node-ipv4-internalip-d6bns 1/1 Running 0 34s 172.18.0.2 kind-worker3 test-conn-disrupt-client-non-backend-node-ipv6-internalip-4x77p 1/1 Running 0 34s 172.18.0.2 kind-worker3 test-conn-disrupt-server-ns-traffic-5cc46556fb-bm9lz 1/1 Running 0 37s 10.244.2.173 kind-worker2 ``` Signed-off-by: Yusuke Suzuki <yusuke.suzuki@isovalent.com>
include-conn-disrupt-test-ns-traffic is false by default, and we need to explicitly set it to true to run the conn-disrupt-test for NS traffic. This is to prevent CI from disrupting when updating cilium-cli on the stable branch. Also, this is intended to provide a way to opt out if this test causes instability in CI. Signed-off-by: Yusuke Suzuki <yusuke.suzuki@isovalent.com>
lb-acceleration is enabled or ipsec and KPR is enabled. ci-e2e-upgrade and ci-ipsec-upgrade become unstable with those settings. Skip it for now. Signed-off-by: Yusuke Suzuki <yusuke.suzuki@isovalent.com>
620378f
to
24ffc76
Compare
This commit enables conn-disrupt-test for NS traffic in ci-e2e-upgrade, ci-ipsec-upgrade and ci-ipsec-e2e. It will be skipped in conformance-eks, ci-clustermesh upgrade and ci-ces-migrate because the feature node-without-cilium is disabled. (The one for EW traffic will run) Signed-off-by: Yusuke Suzuki <yusuke.suzuki@isovalent.com>
24ffc76
to
22fdd9a
Compare
/test |
Just to emphasize - the IPsec workflows would then currently run this test in any configuration where KPR is switched off, correct? |
@joestringer At least for the kind-based CI parts this should be not a big concern - the "outside" entity is really just another kind node that's not part of the cluster. We use the same approach in the EgressGW tests for instance, and I'm not aware that it's adding much flakiness to the results. But ack on the overall sentiment - I like that the PR is enabling this new test very selectively, and for instance we're not exposing older Cilium releases to it when bumping the CLI version in CI. |
Yes, that is correct. |
This PR extends the conn-disrupt-test to ensure that external connections through a NodePort are not disrupted.
The test deploys a conn-disrupt-test server with a single replica in the cluster and multiple clients on an external node.
Clients establish long-lived TCP connections with the server using a NodePort. Each client selects a node where the backend is running, and one where the backend is not running. The clients send requests to both nodes using IPv4 and IPv6 addresses.
After upgrade, the test checks the restart counters, and compares them with the counters stored before the upgrade. A mismatch indicates that a connection was interrupted.
Example output from
kubectl -n cilium-test-1 get po -o wide
:The conn-disrupt-test for NS traffic runs in the following workflows
It will be skipped in the following because the feature node-without-cilium is disabled (The one for EW traffic will run)
Note that ci-e2e-upgrade and ci-ipsec-upgrade become unstable if we enable no-interrupted-connections for NS traffic test. Will skip these conditions (lb-acceleration and ipsec + KPR + ipv6) for now, and investigate the cause as follow-up.
ci-e2e-upgrade (lb-acceleration enabled)
https://github.com/cilium/cilium/actions/runs/12984833400/job/36208601353
ci-ipsec-upgrade (ipsec + KPR + ipv6)
https://github.com/cilium/cilium/actions/runs/12982469981/job/36203776562
Fixes: #13530