-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
Is there an existing issue for this?
- I have searched the existing issues
What happened?
When running cilium-dbg status
within cilium-agent, it outputs cluster health information with a list of nodes, example output:
Cluster health: 10/12 reachable (2024-06-05T09:38:52Z)
Name IP Node Endpoints
cluster-1/node-1 172.21.114.19 unreachable unreachable
in this case, it actually did not show one of the unreachable nodes:
cluster-2/node-2 172.22.248.182 unreachable unreachable
even though by default it's supposed to show up to 10 entries and print (...) at the end to indicate partial results.
Thanks @bmcustodio for reporting.
Cilium Version
main
Kernel Version
N/A
Kubernetes Version
N/A
Regression
No
Sysdump
No response
Relevant log output
No response
Anything else?
cilium-dbg status is supposed to print up to 10 health lines:
cilium/cilium-dbg/cmd/status.go
Line 41 in 5f2e61a
healthLines = 10 |
we can see that actually it does not count the number of lines properly:
- it doesn't take into account if any information was printed or not - this depends on flags succinct/verbose - as in the example output, it only printed a single line even though there was a second node that was also unhealthy.
- it does not skip counting localhost in this for loop, so localhost is counted twice, first time in https://github.com/cilium/cilium/blob/5f2e61a1cce749619b15aacd957641cf10814a33/pkg/health/client/client.go#L382C1-L385C3
and then also in the loop iterating all nodes
Also, condition for printing "(...)" is incorrect:
https://github.com/cilium/cilium/blob/5f2e61a1cce749619b15aacd957641cf10814a33/pkg/health/client/client.go#L400C1-L402C3
in the case of printing all nodes including healthy nodes, it doesn't take into account for example case, when all nodes are healthy.
It would result in check len(sr.Nodes)-healthy = 0 > maxLines
which would be always false.
Proposed design:
Instead of counting and trying to guess if the line was printed or not, let's add boolean returned by formatNodeStatus
to indicate if it printed a line or not. Once we get maxLines lines printed, we stop printing any remaining lines, but still check if there are remaining nodes, that we didn't iterate over and print "(...)" at the end.
We can also refactor a bit flags of formatNodeStatus
and get rid of succinct
flag as it's always set to true.
Cilium Users Document
- Are you a user of Cilium? Please add yourself to the Users doc
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Labels
Type
Projects
Status