Skip to content

Conversation

giorio94
Copy link
Member

Controller runtime emits an error log whenever a reconcile function terminates with an error. However, reconciliation errors are harmless in most cases, as the subsequent iteration will normally pull the updated state and succeed. Still, this breaks the Cilium's assumption that error logs are emitted only in case of serious conditions, causing potential user confusion, as well as triggering the error logs check run as part of the E2E tests.

Let's introduce a wrapper responsible for converting "Reconciler error" messages to info level logs, when the returned error is likely to be transient. We specifically check the error type to avoid hiding unexpected ones, although more types could be added in the future if deemed helpful.

@giorio94 giorio94 added area/operator Impacts the cilium-operator component release-note/misc This PR makes changes that have no direct user impact. labels Oct 11, 2024
@giorio94 giorio94 requested review from a team as code owners October 11, 2024 15:52
@giorio94 giorio94 requested a review from sayboras October 11, 2024 15:52
@giorio94
Copy link
Member Author

/test

@giorio94 giorio94 force-pushed the mio/controller-runtime-log branch from f96d81b to 0ccadb7 Compare October 15, 2024 09:30
@giorio94
Copy link
Member Author

I'm tentatively marking this PR for backport to v1.16, as it aims to reduce CI flakiness by avoiding that temporary errors are flagged by the dedicated connectivity test. Even though it slightly changes the user-facing behavior, I'd argue that the reporting temporary errors that will automatically self-heal as error log was more confusing.

@giorio94 giorio94 added backport/author The backport will be carried out by the author of the PR. needs-backport/1.16 This PR / issue needs backporting to the v1.16 branch labels Oct 15, 2024
Copy link
Member

@sayboras sayboras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ✔️

@sayboras
Copy link
Member

I'm tentatively marking this PR for backport to v1.16, as it aims to reduce CI flakiness by avoiding that temporary errors are flagged by the dedicated connectivity test. Even though it slightly changes the user-facing behavior, I'd argue that the reporting temporary errors that will automatically self-heal as error log was more confusing.

If it's the case, we might need to backport the below PR as well.

#35253

@giorio94
Copy link
Member Author

If it's the case, we might need to backport the below PR as well.

I was actually thinking of backporting this one only, and slightly adapting the approach to work on top of logrus rather than slog, but that's definitely another possibility.

@giorio94 giorio94 force-pushed the mio/controller-runtime-log branch from 0ccadb7 to 274d7c8 Compare October 15, 2024 10:05
@giorio94
Copy link
Member Author

Pushed to fix a linting error

@giorio94
Copy link
Member Author

/test

@sayboras sayboras force-pushed the mio/controller-runtime-log branch from 274d7c8 to 771a927 Compare October 15, 2024 11:17
@sayboras
Copy link
Member

/test

Controller runtime emits an error log whenever a reconcile function
terminates with an error. However, reconciliation errors are harmless
in most cases, as the subsequent iteration will normally pull the
updated state and succeed. Still, this breaks the Cilium's assumption
that error logs are emitted only in case of serious conditions,
causing potential user confusion, as well as triggering the error
logs check run as part of the E2E tests.

Let's introduce a wrapper responsible for converting "Reconciler error"
messages to info level logs, when the returned error is likely to be
transient. We specifically check the error type to avoid hiding unexpected
ones, although more types could be added in the future if deemed helpful.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94
Copy link
Member Author

Rebased onto main to hopefully make CI happier.

@giorio94 giorio94 force-pushed the mio/controller-runtime-log branch from 771a927 to f747c7f Compare October 21, 2024 07:54
@giorio94
Copy link
Member Author

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Oct 21, 2024
@julianwiedmann julianwiedmann added this pull request to the merge queue Oct 22, 2024
Merged via the queue into cilium:main with commit e2b17ec Oct 22, 2024
63 checks passed
@giorio94 giorio94 deleted the mio/controller-runtime-log branch October 28, 2024 13:38
@giorio94 giorio94 mentioned this pull request Oct 28, 2024
1 task
@giorio94 giorio94 added the backport-pending/1.16 The backport for Cilium 1.16.x for this PR is in progress. label Oct 28, 2024
@giorio94 giorio94 removed the needs-backport/1.16 This PR / issue needs backporting to the v1.16 branch label Oct 28, 2024
@github-actions github-actions bot added backport-done/1.16 The backport for Cilium 1.16.x for this PR is done. and removed backport-pending/1.16 The backport for Cilium 1.16.x for this PR is in progress. labels Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/operator Impacts the cilium-operator component backport/author The backport will be carried out by the author of the PR. backport-done/1.16 The backport for Cilium 1.16.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants