Grafana Dashboards

Curated collection of Grafana dashboards for monitoring the Goldentooth Raspberry Pi cluster infrastructure.

Overview

This repository contains production-ready Grafana dashboards designed specifically for monitoring a Raspberry Pi cluster running Kubernetes, HashiCorp stack (Consul, Nomad, Vault), and various observability tools.

Available Dashboards

`hashicorp-services-overview.json`

Comprehensive overview of HashiCorp services:

Consul: Cluster membership, service health, KV store metrics
Nomad: Job status, resource allocation, client health
Vault: Secret engine metrics, authentication status, token usage

`infrastructure-health-overview.json`

High-level infrastructure monitoring:

Node Health: CPU, memory, disk usage across all Pi nodes
Network Performance: Latency, bandwidth, connectivity status
Service Availability: Uptime metrics for critical cluster services
Storage: NFS exports, ZFS pool status, disk I/O

`prometheus-node-exporter.json`

Detailed system metrics from node_exporter:

Hardware Monitoring: Temperature, voltage, frequency scaling
Resource Utilization: Per-node CPU, memory, disk, network
Process Monitoring: System load, context switches, interrupts
Filesystem Details: Mount points, inode usage, disk space

`slurm-cluster-overview.json`

SLURM workload manager dashboard:

Job Queue: Pending, running, completed job metrics
Partition Status: Node allocation, resource availability
User Activity: Job submission patterns, resource consumption
Cluster Efficiency: Utilization rates, queue times

Integration

These dashboards are automatically provisioned through the Goldentooth Ansible role:

goldentooth setup_grafana

They integrate with:

Prometheus: Primary metrics collection
Node Exporter: System-level metrics
Blackbox Exporter: Service availability monitoring
Custom Exporters: SLURM, HashiCorp services

Deployment

Dashboards are automatically deployed via Ansible to /var/lib/grafana/dashboards/ and configured through provisioning files. Updates are applied through the cluster management pipeline.

Customization

Each dashboard includes:

Variable Templating: Node selection, time ranges, service filters
Alert Annotations: Integration with Prometheus AlertManager
Panel Descriptions: Detailed explanations of metrics and thresholds
Responsive Layout: Optimized for different screen sizes

For cluster-specific customizations, modify the dashboard JSON files and redeploy through Ansible.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.emoji		.emoji
LICENSE		LICENSE
README.md		README.md
hashicorp-services-overview.json		hashicorp-services-overview.json
infrastructure-health-overview.json		infrastructure-health-overview.json
prometheus-node-exporter.json		prometheus-node-exporter.json
slurm-cluster-overview.json		slurm-cluster-overview.json
step-ca-certificate-dashboard.json		step-ca-certificate-dashboard.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Grafana Dashboards

Overview

Available Dashboards

`hashicorp-services-overview.json`

`infrastructure-health-overview.json`

`prometheus-node-exporter.json`

`slurm-cluster-overview.json`

Integration

Deployment

Customization

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

License

goldentooth/grafana-dashboards

Folders and files

Latest commit

History

Repository files navigation

Grafana Dashboards

Overview

Available Dashboards

hashicorp-services-overview.json

infrastructure-health-overview.json

prometheus-node-exporter.json

slurm-cluster-overview.json

Integration

Deployment

Customization

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

`hashicorp-services-overview.json`

`infrastructure-health-overview.json`

`prometheus-node-exporter.json`

`slurm-cluster-overview.json`

Packages