Skip to content

Improve Hubble Relay Kubernetes Readiness/Liveness check #23542

@gandro

Description

@gandro

At the moment, Hubble Relay's Kubernetes readiness and liveness check are simply checking if the TCP port is open:

readinessProbe:
tcpSocket:
port: grpc
livenessProbe:
tcpSocket:
port: grpc

This kind of check is not particularly useful in determining if Hubble is ready. Instead, the user is required to manually inspect Relay's logs for common errors, such as: No access to peer service, mTLS authentication failures, pod2host connectivity broken and no connection to Hubble Observers.

We should introduce a more meaningful health-check which performs some basic checks, for example:

  • Have we received cluster information from the peer service?
  • Are we connected to a Hubble Observer on at least one node?
  • (Bonus): Do the connected Hubble Observers contain any flows in their ring buffer?

@kaworu mentioned that this health check could be exposed via the gRPC endpoint, which an upcoming Kubernetes version seems to support. This means we do not have to create a custom HTTP server just for this. @kaworu feel free to provide more details.

For older versions of Kubernetes, we could provide a hubble-relay healthz sub-command and use a command based readiness/liveness probe.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/agentCilium agent related.help-wantedPlease volunteer for this by adding yourself as an assignee!kind/enhancementThis would improve or streamline existing functionality.pinnedThese issues are not marked stale by our issue bot.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions