Skip to content

Conversation

MrFreezeex
Copy link
Member

In commit c464e66 the kvstoremesh-specific certificate was mounted into cilium-agent. Since the recent addition of endpointslicesync the operator needs to connect to clustermesh components too meaning that the clustermesh config and secrets should be mirrored to the operator deployment as well. With the addition of the kvstoremesh secrets to the agent and not the operator this resulted into the code connecting to the kvstoremesh not working in the operator context which is now the default as well.

Fix operator deployment connecting to clustermesh kvstoremesh when endpointslice sync or MCS-API Service exports is enabled

In commit c464e66 the
kvstoremesh-specific certificate was mounted into cilium-agent. Since
the recent addition of endpointslicesync the operator needs to connect to
clustermesh components too meaning that the clustermesh config and
secrets should be mirrored to the operator deployment as well.
With the addition of the kvstoremesh secrets to the agent and not the operator
this resulted into the code connecting to the kvstoremesh not working in the
operator context which is now the default as well.

Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
@MrFreezeex MrFreezeex requested review from a team as code owners August 11, 2024 12:56
@MrFreezeex MrFreezeex requested a review from youngnick August 11, 2024 12:56
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Aug 11, 2024
Copy link
Contributor

@youngnick youngnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from sig-k8s POV.

@MrFreezeex
Copy link
Member Author

/test

@gandro
Copy link
Member

gandro commented Aug 12, 2024

@cilium/sig-clustermesh Could you take a look as well?

Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helm structure looks good to me, though I lack the knowledge about clustermesh to understand the implications

@gandro gandro added release-note/bug This PR fixes an issue in a previous release of Cilium. area/clustermesh Relates to multi-cluster routing functionality in Cilium. area/helm Impacts helm charts and user deployment experience labels Aug 12, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Aug 12, 2024
@giorio94 giorio94 self-requested a review August 19, 2024 10:10
Copy link
Member

@giorio94 giorio94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! This is definitely an indicator that it is time to extract that configuration into a reusable template, as otherwise it will diverge over time.

Before approving, I'd like to still figure out why the E2E test supposed to exercise this feature didn't fail, given that the operator cannot connect to kvstoremesh.

@MrFreezeex
Copy link
Member Author

Thanks for the review!

Before approving, I'd like to still figure out why the E2E test supposed to exercise this feature didn't fail, given that the operator cannot connect to kvstoremesh.

Uh indeed that's true, my assumption was that kvstoremesh and endpointslicesync was not a covered scenario but that's not true so there must be something wrong with the e2e tests indeed :/. Will take a look!

@giorio94
Copy link
Member

Thanks for the review!

Before approving, I'd like to still figure out why the E2E test supposed to exercise this feature didn't fail, given that the operator cannot connect to kvstoremesh.

Uh indeed that's true, my assumption was that kvstoremesh and endpointslicesync was not a covered scenario but that's not true so there must be something wrong with the e2e tests indeed :/. Will take a look!

I found the issue. I'll open a PR shortly, after double checking that the fix works as expected.

@giorio94
Copy link
Member

#34455 should fix it.

@giorio94 giorio94 added the needs-backport/1.16 This PR / issue needs backporting to the v1.16 branch label Aug 19, 2024
Copy link
Member

@giorio94 giorio94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks good to me, let's get it merged, and then I'll rebase the connectivity test fix on top of it. I've additionally marked this PR for backport to v1.16, as the original agent change had been included there (although it was not yet strictly enforced to prevent issues on upgrade, and the operator is less prone due to the limited number of replicas).

Then, we can extract the configuration into a reusable template as a followup item.

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Aug 19, 2024
@ldelossa ldelossa added this pull request to the merge queue Aug 20, 2024
Merged via the queue into cilium:main with commit 3f5c699 Aug 20, 2024
74 of 76 checks passed
@MrFreezeex MrFreezeex deleted the fix-operator-clustermesh-kvstoremesh branch August 21, 2024 10:56
@tklauser tklauser mentioned this pull request Aug 27, 2024
13 tasks
@tklauser tklauser added backport-pending/1.16 The backport for Cilium 1.16.x for this PR is in progress. and removed needs-backport/1.16 This PR / issue needs backporting to the v1.16 branch labels Aug 27, 2024
@github-actions github-actions bot added backport-done/1.16 The backport for Cilium 1.16.x for this PR is done. and removed backport-pending/1.16 The backport for Cilium 1.16.x for this PR is in progress. labels Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clustermesh Relates to multi-cluster routing functionality in Cilium. area/helm Impacts helm charts and user deployment experience backport-done/1.16 The backport for Cilium 1.16.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants