Skip to content

Conversation

MrFreezeex
Copy link
Member

@MrFreezeex MrFreezeex commented May 29, 2025

Followup to #39338 adding support for connectivity test for this new mode.

Also this draft PR #39731 contains the same cli commits but with the new option enabled to have a full CI test run with this turned on by default

cli: add suport for policy-default-local-cluster in connectivity tests

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 29, 2025
@github-actions github-actions bot added the cilium-cli This PR contains changes related with cilium-cli label May 29, 2025
@MrFreezeex
Copy link
Member Author

/test

@MrFreezeex MrFreezeex added release-note/minor This PR changes functionality that users may find relevant to operating Cilium. and removed cilium-cli This PR contains changes related with cilium-cli labels May 29, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 29, 2025
@MrFreezeex MrFreezeex added cilium-cli This PR contains changes related with cilium-cli cilium-cli-exclusive This PR only impacts cilium-cli binary labels May 29, 2025
@MrFreezeex MrFreezeex changed the title cilium-cli: add connectivity tests for policy-default-local-cluster cilium-cli: add connectivity tests support for policy-default-local-cluster May 29, 2025
@MrFreezeex
Copy link
Member Author

/ci-ingress

@MrFreezeex MrFreezeex marked this pull request as ready for review May 29, 2025 13:45
@MrFreezeex MrFreezeex requested review from a team as code owners May 29, 2025 13:45
Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we consider how to test this functionality without end-to-end tests, for instance using hive/script? As a general observation, e2e tests tend to fail at a higher rate due to integration problems and they also fail in ways that are not obvious to the general contributor. We already struggle a lot with the existing e2e tests, and I worry that each time we add one more permutation into the tests, we make that problem worse.

From a pure code perspective this looks good for @cilium/github-sec code owners.

@MrFreezeex
Copy link
Member Author

MrFreezeex commented May 29, 2025

Did we consider how to test this functionality without end-to-end tests, for instance using hive/script? As a general observation, e2e tests tend to fail at a higher rate due to integration problems and they also fail in ways that are not obvious to the general contributor. We already struggle a lot with the existing e2e tests, and I worry that each time we add one more permutation into the tests, we make that problem worse.

If it can be reassuring for you the goal of this new option is to make this the default behavior eventually and probably to deprecate it/remove the option completely. Potentially If I can contribute the second missing thing which is a reporting tool to flag policies that would be changed by this either in preflight or cilium-cli (TBD) in time for 1.18, it could already be a default option for 1.19 and be deprecated/removed somewhere in 1.20/1.21 (to be discussed ofc!). There's also ofc some unit testing in the original PR.

That being said if you feel like it needs some additional testing through hive/script I can take a look at doing that too.

christarazi
christarazi previously approved these changes May 29, 2025
@joestringer
Copy link
Member

if you feel like it needs some additional testing through hive/script I can take a look at doing that too.

I can't say I'm familiar enough with the particular feature or area to give good technical guidance on this. Maybe @cilium/sig-clustermesh reviewers could consider this and provide pointers whether there's a concrete improvement we could make or whether these changes are good enough.

@christarazi christarazi dismissed their stale review May 29, 2025 18:08

Misclick

Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my ci-structure point of view, this seems like a very heavy handed way to test clustermesh policy functionality. This PR is basically touching every manifest in the Cilium CLI. I would agree with Joe here that exploring alternative means is probably the best move. Otherwise, we are adding another dimension to an already complex and error-prone test suite.

@MrFreezeex
Copy link
Member Author

MrFreezeex commented May 29, 2025

I now realize that I provided way too less context here sorry about that!

So in the current cilium-cli connectivity tests we deploy the (IIRC) "other-node" pod to another cluster in a multicluster scenario instead of another node. This allows to run the same tests whether it's for multiple clusters or only one. Without this mode turned on there was no need for the policy to be aware of its cluster mesh environments. Unless explicitly specified a network policy was allowing traffic to every cluster. But this new option policy-default-local-cluster essentially invert this behavior, network policies will now need to explicitly target some clusters and if unspecified it will only target the local one. But since this is the point of this new option and the way we run connectivity tests it basically means that now network policies have to also include both clusters specifically to cover the fact that the "other node" pod can be deployed in another cluster.

Apart from the fact that there is one more option in the tests matrix (which is a totally fair concern!) this is mostly not really a CI/testing problem because if a user run the connectivity test with this option turned on it should hopefully succeed. So unless we give up on this strategy of deploying some pod to another cluster to run most tests we unfortunately need to update most netpol manifests related to cli connectivity tests :/...

Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the context, that helps. From a ci-structure point of view, the changes makes sense and are as minimal as I can think of.

Copy link
Member

@giorio94 giorio94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! A couple of comments inline.

@MrFreezeex MrFreezeex force-pushed the pr/mrfreezeex/conn-test-default-policy-local-cluster branch 2 times, most recently from f96c54f to 1d00621 Compare May 30, 2025 12:39
@MrFreezeex
Copy link
Member Author

/test

Add a new test when policy-default-local-cluster where we have a policy
which we don't specify any cluster to be filtered. In a multicluster
scenario this means that the traffic to the remote cluster is denied and
in a single cluster the traffic should be authorized.

Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
@MrFreezeex MrFreezeex force-pushed the pr/mrfreezeex/conn-test-default-policy-local-cluster branch from 1d00621 to 000a08c Compare May 30, 2025 13:02
@MrFreezeex
Copy link
Member Author

/test

@MrFreezeex MrFreezeex requested a review from giorio94 May 30, 2025 13:29
@MrFreezeex MrFreezeex added the area/clustermesh Relates to multi-cluster routing functionality in Cilium. label Jun 1, 2025
Copy link
Member

@giorio94 giorio94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@giorio94 giorio94 enabled auto-merge June 3, 2025 07:19
@giorio94 giorio94 added this pull request to the merge queue Jun 5, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 5, 2025
Merged via the queue into main with commit 62e6b2a Jun 5, 2025
235 of 238 checks passed
@giorio94 giorio94 deleted the pr/mrfreezeex/conn-test-default-policy-local-cluster branch June 5, 2025 02:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clustermesh Relates to multi-cluster routing functionality in Cilium. cilium-cli This PR contains changes related with cilium-cli cilium-cli-exclusive This PR only impacts cilium-cli binary ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants