Skip to content

cilium-dbg status: Cluster health reporting unintuitive/incorrect results #33697

@marseel

Description

@marseel

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

When running cilium-dbg status within cilium-agent, it outputs cluster health information with a list of nodes, example output:

Cluster health:                            10/12 reachable   (2024-06-05T09:38:52Z)
  Name                                     IP                Node          Endpoints
  cluster-1/node-1   172.21.114.19     unreachable   unreachable

in this case, it actually did not show one of the unreachable nodes:

 cluster-2/node-2               172.22.248.182    unreachable   unreachable

even though by default it's supposed to show up to 10 entries and print (...) at the end to indicate partial results.

Thanks @bmcustodio for reporting.

Cilium Version

main

Kernel Version

N/A

Kubernetes Version

N/A

Regression

No

Sysdump

No response

Relevant log output

No response

Anything else?

cilium-dbg status is supposed to print up to 10 health lines:

healthLines = 10

https://github.com/cilium/cilium/blob/5f2e61a1cce749619b15aacd957641cf10814a33/pkg/health/client/client.go#L391C1-L402C3

we can see that actually it does not count the number of lines properly:

Also, condition for printing "(...)" is incorrect:
https://github.com/cilium/cilium/blob/5f2e61a1cce749619b15aacd957641cf10814a33/pkg/health/client/client.go#L400C1-L402C3
in the case of printing all nodes including healthy nodes, it doesn't take into account for example case, when all nodes are healthy.
It would result in check len(sr.Nodes)-healthy = 0 > maxLines which would be always false.

Proposed design:
Instead of counting and trying to guess if the line was printed or not, let's add boolean returned by formatNodeStatus to indicate if it printed a line or not. Once we get maxLines lines printed, we stop printing any remaining lines, but still check if there are remaining nodes, that we didn't iterate over and print "(...)" at the end.
We can also refactor a bit flags of formatNodeStatus and get rid of succinct flag as it's always set to true.

Cilium Users Document

  • Are you a user of Cilium? Please add yourself to the Users doc

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

Labels

good-first-issueGood starting point for new developers, which requires minimal understanding of Cilium.help-wantedPlease volunteer for this by adding yourself as an assignee!kind/enhancementThis would improve or streamline existing functionality.

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions