Skip to content

goldentooth/grafana-dashboards

Repository files navigation

Grafana Dashboards

Curated collection of Grafana dashboards for monitoring the Goldentooth Raspberry Pi cluster infrastructure.

Overview

This repository contains production-ready Grafana dashboards designed specifically for monitoring a Raspberry Pi cluster running Kubernetes, HashiCorp stack (Consul, Nomad, Vault), and various observability tools.

Available Dashboards

hashicorp-services-overview.json

Comprehensive overview of HashiCorp services:

  • Consul: Cluster membership, service health, KV store metrics
  • Nomad: Job status, resource allocation, client health
  • Vault: Secret engine metrics, authentication status, token usage

infrastructure-health-overview.json

High-level infrastructure monitoring:

  • Node Health: CPU, memory, disk usage across all Pi nodes
  • Network Performance: Latency, bandwidth, connectivity status
  • Service Availability: Uptime metrics for critical cluster services
  • Storage: NFS exports, ZFS pool status, disk I/O

prometheus-node-exporter.json

Detailed system metrics from node_exporter:

  • Hardware Monitoring: Temperature, voltage, frequency scaling
  • Resource Utilization: Per-node CPU, memory, disk, network
  • Process Monitoring: System load, context switches, interrupts
  • Filesystem Details: Mount points, inode usage, disk space

slurm-cluster-overview.json

SLURM workload manager dashboard:

  • Job Queue: Pending, running, completed job metrics
  • Partition Status: Node allocation, resource availability
  • User Activity: Job submission patterns, resource consumption
  • Cluster Efficiency: Utilization rates, queue times

Integration

These dashboards are automatically provisioned through the Goldentooth Ansible role:

goldentooth setup_grafana

They integrate with:

  • Prometheus: Primary metrics collection
  • Node Exporter: System-level metrics
  • Blackbox Exporter: Service availability monitoring
  • Custom Exporters: SLURM, HashiCorp services

Deployment

Dashboards are automatically deployed via Ansible to /var/lib/grafana/dashboards/ and configured through provisioning files. Updates are applied through the cluster management pipeline.

Customization

Each dashboard includes:

  • Variable Templating: Node selection, time ranges, service filters
  • Alert Annotations: Integration with Prometheus AlertManager
  • Panel Descriptions: Detailed explanations of metrics and thresholds
  • Responsive Layout: Optimized for different screen sizes

For cluster-specific customizations, modify the dashboard JSON files and redeploy through Ansible.

About

Grafana Dashboards

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •