Skip to content

Silence spurious clustermesh-related warnings #35867

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 12, 2024

Conversation

giorio94
Copy link
Member

@giorio94 giorio94 commented Nov 8, 2024

Please refer to the individual commits for additional details.

Fixes: #35865
Marked for backport to v1.16 to prevent issues on downgrade when warning logs start being flagged in CI.

Don't emit a spurious warning in case remote cluster configuration
retrieval gets aborted due to the parent context being canceled
(e.g., due to reconnecting to etcd), as already logged elsewhere,
and potentially misleading to the users.

Reported-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Currently, a warning log gets always emitted in case the list operation
to start a watcher fails, even though it will automatically retried.
Given that we are starting to flag warning logs in CI, and this can
briefly occur in the clustermesh context during the initialization phase
(especially when the authorization mode is configured to cluster, until
permissions are granted), let's lower the severity for the first few
occurrences, and emit a warning only if the situation does not improve.

Reported-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Explicitly configure the clustermesh-sync-timeout parameter to a higher
value to prevent emitting spurious warning logs during the initial
connection phase when the authorization mode is configured to cluster.
Indeed, Cilium agents get restarted at that point (due to the
configuration of host aliases), while the clustermesh-apiserver does
not (assuming that KVStoreMesh is disabled), potentially requiring up
to one minute to react to the configmap change and create the associated
user in etcd. Still, the warning in this case would have been benign,
because no cross-cluster connections could be present, as we are meshing
the two clusters for the first time right now.

Reported-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94 giorio94 added area/clustermesh Relates to multi-cluster routing functionality in Cilium. release-note/misc This PR makes changes that have no direct user impact. area/kvstore Impacts the KVStore package interactions. needs-backport/1.16 This PR / issue needs backporting to the v1.16 branch labels Nov 8, 2024
@giorio94
Copy link
Member Author

giorio94 commented Nov 8, 2024

/test

@giorio94 giorio94 marked this pull request as ready for review November 8, 2024 15:33
@giorio94 giorio94 requested review from a team as code owners November 8, 2024 15:33
@giorio94 giorio94 requested review from marseel and aanm November 8, 2024 15:33
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 12, 2024
@squeed squeed added this pull request to the merge queue Nov 12, 2024
Merged via the queue into main with commit 24acb24 Nov 12, 2024
289 checks passed
@squeed squeed deleted the pr/giorio94/main/clustermesh-warnings branch November 12, 2024 10:08
@giorio94 giorio94 added backport-pending/1.16 The backport for Cilium 1.16.x for this PR is in progress. and removed needs-backport/1.16 This PR / issue needs backporting to the v1.16 branch labels Nov 12, 2024
@github-actions github-actions bot added backport-done/1.16 The backport for Cilium 1.16.x for this PR is done. and removed backport-pending/1.16 The backport for Cilium 1.16.x for this PR is in progress. labels Nov 12, 2024
@@ -158,7 +158,12 @@ func (rc *remoteCluster) restartRemoteConnection() {
if backend != nil {
backend.Close()
}
rc.logger.WithError(err).Warning("Unable to establish etcd connection to remote cluster")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@giorio94 I see you did not backport this change to v1.16. Was that intentional? I'm seeing it on downgrades to v1.16: https://github.com/cilium/cilium/actions/runs/11935307422/job/33266363458. Should I just allowlist it until v1.17 is out?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for noticing. It seems I mistakenly linked a different backport to this one, and marked this one as completed (rather than the other one). Marking again for backport, so that it gets included in the next round.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aah, that's why the commits looked so different 😆

@giorio94 giorio94 added needs-backport/1.16 This PR / issue needs backporting to the v1.16 branch and removed backport-done/1.16 The backport for Cilium 1.16.x for this PR is done. labels Nov 21, 2024
@giorio94 giorio94 mentioned this pull request Nov 28, 2024
1 task
@giorio94 giorio94 added backport-pending/1.16 The backport for Cilium 1.16.x for this PR is in progress. and removed needs-backport/1.16 This PR / issue needs backporting to the v1.16 branch labels Nov 28, 2024
@github-actions github-actions bot added backport-done/1.16 The backport for Cilium 1.16.x for this PR is done. and removed backport-pending/1.16 The backport for Cilium 1.16.x for this PR is in progress. labels Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clustermesh Relates to multi-cluster routing functionality in Cilium. area/kvstore Impacts the KVStore package interactions. backport-done/1.16 The backport for Cilium 1.16.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Warnings on clustermesh startup
6 participants