Skip to content

Conversation

marseel
Copy link
Contributor

@marseel marseel commented Apr 8, 2025

To show how aggregation and reporting works, I've switched one log "Updating tunnel map entry" from Debug to Error.
Example run: https://github.com/cilium/cilium/actions/runs/14337153953/job/40187065813?pr=38812

Found 1 logs in kind-chart-testing/kube-system/cilium-ms24n (cilium-agent) matching list of errors that must be investigated:
time=2025-04-08T15:50:26Z level=error source=/go/src/github.com/cilium/cilium/pkg/datapath/linux/node.go:189 msg="Updating tunnel map entry" module=agent.datapath ipAddr=172.18.0.3 allocCIDR=fd00:10:244::/64 (4 occurrences)

Notice that occurrences are calculated better now, before because of timestamps we were not aggregating them together.
Error is reported for datapath team, who is owner of file (pkg/datapath/linux/node.go) containing this log message:

check-log-errors/no-errors-in-logs/kind-chart-testing/kube-system/cilium-ms24n (cilium-agent)
    ⛑️ The following owners are responsible for reliability of the testsuite: 
        - @cilium/sig-datapath (no-errors-in-logs)
        - @cilium/sig-servicemesh (.github/workflows/conformance-kind-proxy-embedded.yaml)
        - @cilium/ci-structure (.github/workflows/conformance-kind-proxy-embedded.yaml)

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Apr 8, 2025
@github-actions github-actions bot added the cilium-cli This PR contains changes related with cilium-cli label Apr 8, 2025
@marseel marseel added the release-note/ci This PR makes changes to the CI. label Apr 9, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Apr 9, 2025
@marseel marseel changed the title Pr/marseel/codeowners for error logs Assign codeowners for no-errors-in-logs testcase Apr 9, 2025
@marseel marseel force-pushed the pr/marseel/codeowners_for_error_logs branch from 5abeeb8 to ebd50a7 Compare April 9, 2025 13:27
@marseel
Copy link
Contributor Author

marseel commented Apr 9, 2025

/test

@marseel marseel requested a review from joestringer April 9, 2025 15:02
@marseel marseel marked this pull request as ready for review April 9, 2025 15:02
@marseel marseel requested review from a team as code owners April 9, 2025 15:02
@marseel marseel requested review from derailed and brlbil April 9, 2025 15:02
Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a reasonable way to do this. Hopefully we don't have a flood of many failure logs, but even if we do it makes sense to start with the most common failures anyway.

Main note is about how reliable the slog extension is and also if we could do better for stable branches. Might even be worth splitting the cilium-agent changes into a separate dedicated PR just to simplify backporting, but that's not a big deal.

marseel added 4 commits April 10, 2025 15:34
Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>
Before, we were counting unique logs also taking into account timestamp.
As a result, almost all errors were unique so it didn't give a good
signal of which error is most common. Let's switch to counting unique
errors only based on msg field instead.

Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>
For cases when log is produced by slog and debug logs are enabled, we
have an additional path with location of source file. In case of
no-errors-in-logs failure, we can attribute failure to CODEOWNERS of
that file which generated error log.
This will be especially useful for CI.

Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>
Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>
@marseel marseel force-pushed the pr/marseel/codeowners_for_error_logs branch from 73ffea5 to 96496ea Compare April 10, 2025 13:36
@marseel
Copy link
Contributor Author

marseel commented Apr 10, 2025

/test

@marseel marseel requested a review from joestringer April 10, 2025 13:42
@joestringer joestringer enabled auto-merge April 10, 2025 14:35
@marseel
Copy link
Contributor Author

marseel commented Apr 14, 2025

@derailed Friendly ping :)

@joestringer joestringer added this pull request to the merge queue Apr 15, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Apr 15, 2025
Merged via the queue into cilium:main with commit b85b8bb Apr 15, 2025
69 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cilium-cli This PR contains changes related with cilium-cli ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/ci This PR makes changes to the CI.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CLI: Correlate check-log-errors failures with individual test runs
4 participants