Silence spurious clustermesh-related warnings #35867

giorio94 · 2024-11-08T14:43:04Z

Please refer to the individual commits for additional details.

Fixes: #35865
Marked for backport to v1.16 to prevent issues on downgrade when warning logs start being flagged in CI.

Don't emit a spurious warning in case remote cluster configuration retrieval gets aborted due to the parent context being canceled (e.g., due to reconnecting to etcd), as already logged elsewhere, and potentially misleading to the users. Reported-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

Currently, a warning log gets always emitted in case the list operation to start a watcher fails, even though it will automatically retried. Given that we are starting to flag warning logs in CI, and this can briefly occur in the clustermesh context during the initialization phase (especially when the authorization mode is configured to cluster, until permissions are granted), let's lower the severity for the first few occurrences, and emit a warning only if the situation does not improve. Reported-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

Explicitly configure the clustermesh-sync-timeout parameter to a higher value to prevent emitting spurious warning logs during the initial connection phase when the authorization mode is configured to cluster. Indeed, Cilium agents get restarted at that point (due to the configuration of host aliases), while the clustermesh-apiserver does not (assuming that KVStoreMesh is disabled), potentially requiring up to one minute to react to the configmap change and create the associated user in etcd. Still, the warning in this case would have been benign, because no cross-cluster connections could be present, as we are meshing the two clusters for the first time right now. Reported-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

giorio94 · 2024-11-08T14:47:30Z

/test

pchaigno · 2024-11-20T21:10:48Z

pkg/clustermesh/common/remote_cluster.go

@@ -158,7 +158,12 @@ func (rc *remoteCluster) restartRemoteConnection() {
 					if backend != nil {
 						backend.Close()
 					}
-					rc.logger.WithError(err).Warning("Unable to establish etcd connection to remote cluster")


@giorio94 I see you did not backport this change to v1.16. Was that intentional? I'm seeing it on downgrades to v1.16: https://github.com/cilium/cilium/actions/runs/11935307422/job/33266363458. Should I just allowlist it until v1.17 is out?

Thanks for noticing. It seems I mistakenly linked a different backport to this one, and marked this one as completed (rather than the other one). Marking again for backport, so that it gets included in the next round.

Aah, that's why the commits looked so different 😆

giorio94 added 3 commits November 8, 2024 15:41

giorio94 mentioned this pull request Nov 8, 2024

Warnings on clustermesh startup #35865

Closed

giorio94 marked this pull request as ready for review November 8, 2024 15:33

giorio94 requested review from a team as code owners November 8, 2024 15:33

giorio94 requested review from marseel and aanm November 8, 2024 15:33

aanm approved these changes Nov 8, 2024

View reviewed changes

christarazi approved these changes Nov 8, 2024

View reviewed changes

marseel approved these changes Nov 12, 2024

View reviewed changes

maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 12, 2024

squeed added this pull request to the merge queue Nov 12, 2024

Merged via the queue into main with commit 24acb24 Nov 12, 2024
289 checks passed

squeed deleted the pr/giorio94/main/clustermesh-warnings branch November 12, 2024 10:08

giorio94 mentioned this pull request Nov 12, 2024

[v1.16] Silence error logs if pod is deleted during restoration #35850

Merged

giorio94 added backport-pending/1.16 The backport for Cilium 1.16.x for this PR is in progress. and removed needs-backport/1.16 This PR / issue needs backporting to the v1.16 branch labels Nov 12, 2024

github-actions bot added backport-done/1.16 The backport for Cilium 1.16.x for this PR is done. and removed backport-pending/1.16 The backport for Cilium 1.16.x for this PR is in progress. labels Nov 12, 2024

cilium-release-bot bot mentioned this pull request Nov 13, 2024

Prepare for release v1.16.4 #35954

Merged

pchaigno reviewed Nov 20, 2024

View reviewed changes

giorio94 added needs-backport/1.16 This PR / issue needs backporting to the v1.16 branch and removed backport-done/1.16 The backport for Cilium 1.16.x for this PR is done. labels Nov 21, 2024

giorio94 mentioned this pull request Nov 28, 2024

v1.16 Backports 2024-11-28 #36225

Merged

1 task

giorio94 added backport-pending/1.16 The backport for Cilium 1.16.x for this PR is in progress. and removed needs-backport/1.16 This PR / issue needs backporting to the v1.16 branch labels Nov 28, 2024

github-actions bot added backport-done/1.16 The backport for Cilium 1.16.x for this PR is done. and removed backport-pending/1.16 The backport for Cilium 1.16.x for this PR is in progress. labels Nov 28, 2024

cilium-release-bot bot mentioned this pull request Dec 12, 2024

Prepare for release v1.16.5 #36557

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Silence spurious clustermesh-related warnings #35867

Silence spurious clustermesh-related warnings #35867

giorio94 commented Nov 8, 2024

Uh oh!

giorio94 commented Nov 8, 2024

Uh oh!

Uh oh!

pchaigno Nov 20, 2024

Uh oh!

giorio94 Nov 21, 2024

Uh oh!

pchaigno Nov 21, 2024

Uh oh!

Uh oh!

Silence spurious clustermesh-related warnings #35867

Silence spurious clustermesh-related warnings #35867

Conversation

giorio94 commented Nov 8, 2024

Uh oh!

giorio94 commented Nov 8, 2024

Uh oh!

Uh oh!

pchaigno Nov 20, 2024

Choose a reason for hiding this comment

Uh oh!

giorio94 Nov 21, 2024

Choose a reason for hiding this comment

Uh oh!

pchaigno Nov 21, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!