Skip to content

Conversation

tommyp1ckles
Copy link
Contributor

@tommyp1ckles tommyp1ckles commented Nov 12, 2024

Fixes: #31374

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Nov 12, 2024
@tommyp1ckles
Copy link
Contributor Author

/test

@tommyp1ckles tommyp1ckles force-pushed the pr/tp/klog-override-matcher branch from 95006b0 to c743ebd Compare November 13, 2024 01:19
@tommyp1ckles tommyp1ckles changed the title Pr/tp/klog override matcher Logging: Add klog override matcher to remap certain errors to "info" level Nov 13, 2024
@tommyp1ckles tommyp1ckles added area/CI Continuous Integration testing issue or flake area/monitor Impacts monitoring, access logging, flow logging, visibility of datapath traffic. release-note/misc This PR makes changes that have no direct user impact. ci/flake This is a known failure that occurs in the tree. Please investigate me! labels Nov 13, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Nov 13, 2024
@tommyp1ckles tommyp1ckles force-pushed the pr/tp/klog-override-matcher branch from c743ebd to 0d43ee9 Compare November 13, 2024 01:27
@tommyp1ckles tommyp1ckles marked this pull request as ready for review November 13, 2024 01:27
@tommyp1ckles tommyp1ckles requested a review from a team as a code owner November 13, 2024 01:27
@tklauser
Copy link
Member

Instead of remapping the errors to info level, have you considered just ignoring the error in the connectivity tests, like we do for other error logs?

@tklauser
Copy link
Member

Upon closer inspection, it seems like we already ignore it:

failedToUpdateLock stringMatcher = "Failed to update lock:"

and

failedToUpdateLock, failedToReleaseLock,

Any idea why this is not enough to ignore it?

@tommyp1ckles
Copy link
Contributor Author

tommyp1ckles commented Nov 15, 2024

Instead of remapping the errors to info level, have you considered just ignoring the error in the connectivity tests, like we do for other error logs?

@tklauser This fundamentally doesn't feel like a error to me, hence the thinking was that we still want to display it to users, but it would be misleading to emit it as a error since it doesn't inherently imply any degradation.

This is quite a bit of change for such a small issue, but the intention was to setup the code to future such remappings.

Maybe we should just propose changing the log level upstream in k8s client-go however, and ignore it for now.

@tommyp1ckles
Copy link
Contributor Author

Any idea why this is not enough to ignore it?

The semicolon is breaking the matching, the log currently appears as: Failed to update lock optimitically (think I saw you on the upstream client-go commit to fix the spelling 😄 )

@tklauser
Copy link
Member

Instead of remapping the errors to info level, have you considered just ignoring the error in the connectivity tests, like we do for other error logs?

@tklauser This fundamentally doesn't feel like a error to me, hence the thinking was that we still want to display it to users, but it would be misleading to emit it as a error since it doesn't inherently imply any degradation.

That makes sense, thanks for the explanation.

This is quite a bit of change for such a small issue, but the intention was to setup the code to future such remappings.

Let's hope we won't need it too often 😅

Maybe we should just propose changing the log level upstream in k8s client-go however, and ignore it for now.

Yeah, that would IMO be the optimal solution.

@tommyp1ckles tommyp1ckles marked this pull request as ready for review November 21, 2024 03:59
Copy link
Member

@tklauser tklauser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The misspell linter is complaining about mispelled being misspelled 😄

image

Otherwise LGTM

@tommyp1ckles tommyp1ckles force-pushed the pr/tp/klog-override-matcher branch from 0d43ee9 to b350373 Compare November 23, 2024 01:27
@tommyp1ckles
Copy link
Contributor Author

/test

@tommyp1ckles tommyp1ckles force-pushed the pr/tp/klog-override-matcher branch from b350373 to 38c9dee Compare November 26, 2024 04:27
@tommyp1ckles
Copy link
Contributor Author

/test

@tommyp1ckles tommyp1ckles force-pushed the pr/tp/klog-override-matcher branch from 38c9dee to bb05b01 Compare November 26, 2024 05:15
@tommyp1ckles
Copy link
Contributor Author

/test

Add functionality to intercept and remap log levels of
klogs being emitted with logrus loggers.

This reimplements some of the klog functionality that
listens/writes on a io.Pipe when klog messages come in
but additionally matches lines against remappings.
If such a override matches, the log will be redirected
to the specified alternate log level logrus logger.

This will allow us to change the level of emitted klogs
logs such that users still see the log event but in the
case where it does not amount to an error as far as
Cilium functionality goes we can just reduce it down to
info, etc.

Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
Using functionality added in previous commits, this adds a
matcher override case that will emit the error log:

`Failed to update lock optimitically ... falling back to slow path`

(note: as well as the correct spelling, as that has been fixed upstream).

Looking at the client-go case where this log happens [1], the
intended behavior is that the leader lock is assumed to be valid
with the fallback behavior being to get the election record.

This seems like a normal set of operations, so we shouldn't consider
this case a error outright.
Thus this being logged as an error is likely to be confusing to users.
However, it would be useful to still have this information logged so
we will use the klog level remapping to override this specific error.
Subsequent errors while attempting leader election will be emitted
by client-go as normal.

[1] https://github.com/kubernetes/client-go/blob/v0.31.2/tools/leaderelection/leaderelection.go#L417-L430

Fixes: cilium#31374

Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
@tommyp1ckles tommyp1ckles force-pushed the pr/tp/klog-override-matcher branch from bb05b01 to 2642554 Compare November 26, 2024 20:45
@tommyp1ckles
Copy link
Contributor Author

/test

@tommyp1ckles tommyp1ckles added this pull request to the merge queue Nov 27, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 27, 2024
Merged via the queue into cilium:main with commit cafa0b5 Nov 27, 2024
64 checks passed
@tommyp1ckles tommyp1ckles deleted the pr/tp/klog-override-matcher branch November 27, 2024 06:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake area/monitor Impacts monitoring, access logging, flow logging, visibility of datapath traffic. ci/flake This is a known failure that occurs in the tree. Please investigate me! ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI: Cilium E2E Upgrade - failed to update lock
3 participants