-
Notifications
You must be signed in to change notification settings - Fork 29
Description
Application contact email(s)
natan@robusta.dev,arik@robusta.dev
Trademark and accounts
- If the project is accepted, I agree to donate all project trademarks and accounts to the CNCF
Contributing or sponsoring entity contact email(s)
Project summary
HolmesGPT is an AI agent that automates cloud-native troubleshooting, bridging knowledge gaps by investigating alerts, executing runbooks, and correlating observability data in cloud-native platforms.
Project description
Troubleshooting cloud-native systems is inherently complex. A simple HTTP error might stem from an application bug, a misconfigured Kubernetes manifest, DNS resolution issues, or a downstream timeout - each living in a different layer of the stack. Engineers must piece together metrics, logs, and traces from multiple tools, as well as running ad-hoc commands like kubectl to gather additional context. Finally, the knowledge to do so is spread across multiple teams.
While observability is indexed easily by tools like Prometheus, Loki, and Tempo, interpreting it still requires deep expertise. This knowledge gap is especially acute among newer teams: the 2024 CNCF survey found that 51% of moderately experienced cloud-native practitioners cited lack of training as a top challenge, even more than technical issues like networking or storage!
HolmesGPT helps close this gap by taking operational knowledge in the form of natural-language runbooks and executing them using large language models. The platform provides extensible integrations with open source and proprietary troubleshooting tools through native toolsets and external MCP (Model Context Protocol) servers. Users can customize the analysis by adding application knowledge and stack expertise, transforming HolmesGPT into a virtual SRE tailored to their environment.
CNCF end users can deploy HolmesGPT to improve troubleshooting across their engineering teams, using existing monitoring data with no additional instrumentation required.
Org repo URL (provide if all repos under the org are in scope of the application)
N/A
Project repo URL in scope of application
https://github.com/robusta-dev/holmesgpt/
Additional repos in scope of the application
N/A
Website URL
https://github.com/robusta-dev/holmesgpt/
Roadmap
Roadmap context
This is a joint roadmap with inputs from all organizations that maintain HolmesGPT (currently Robusta and Microsoft).
Contributing guide
https://github.com/robusta-dev/holmesgpt/blob/master/GOVERNANCE.md
Code of Conduct (CoC)
https://github.com/robusta-dev/holmesgpt/blob/master/CODE_OF_CONDUCT.md
Adopters
https://github.com/robusta-dev/holmesgpt/blob/master/ADOPTERS.md
Maintainers file
https://github.com/robusta-dev/holmesgpt/blob/master/MAINTAINERS.md
Security policy file
https://github.com/robusta-dev/holmesgpt/blob/master/SECURITY.md
IP policy
- If the project is accepted, I agree the project will follow the CNCF IP Policy
Will the project require a license exception?
N/A
Standard or specification?
N/A
Why CNCF?
HolmesGPT's was built to help members of the cloud native community, especially in letting them troubleshoot their existing cloud-native stack faster.
Being part of the CNCF will drive the future success of HolmesGPT by deepening the connection with other CNCF projects, and driving collaboration and community development of more cloud-native integrations.
Finally, several partners and early adopters have expressed interest in contributing to HolmesGPT, but only if it’s governed under a neutral, vendor-independent foundation like the CNCF. Contributing the project to the CNCF will make collaboration easier and encourage broader adoption.
Benefit to the landscape
HolmesGPT uses AI to correlate cloud-native data dispersed across various tools and layers. Most end users operate with a combination of open source projects - Prometheus, Loki, and others - and commercial observability platforms - such as Datadog, Dynatrace, New Relic, and Splunk. Further context often resides outside these systems in ITSM tools like ServiceNow or Jira, chat platforms, and documentation.
For example, an engineer troubleshooting a sudden network failure might find no clear indicators in metrics, logs, or traces. The root cause turns out to be a firewall rule change documented in a ServiceNow ticket - completely invisible to observability tools. HolmesGPT helps surface this kind of context automatically.
By contributing HolmesGPT to the CNCF, we’re extending the landscape in a direction that aligns with where cloud-native operations are headed: intelligent and AI-driven.
Cloud native 'fit'
HolmesGPT is built specifically for cloud-native environments:It operates natively in Kubernetes, consumes telemetry from CNCF observability tools, and was built for end users running cloud-native applications
It also exemplifies several cloud-native principles:
- Kubernetes-native: HolmesGPT can be deployed via Helm as a stateless service on Kubernetes
- Composable and loosely coupled: HolmesGPT integrates with a wide variety of tools inside and outside the CNCF landscape, via builtin integrations and support for MCP
Cloud native 'integration'
Runtime: HolmesGPT can run on Kubernetes
Data-sources: HolmesGPT can pull data from Prometheus, Loki, Tempo, ArgoCD, Helm, and more.
Scope (project goals): HolmesGPT helps users troubleshoot cloud-native environments, including Kubernetes
Cloud native overlap
HolmesGPT overlaps conceptually with two CNCF projects that use large language models, but with differences in scope and execution:
- K8sGPT
Overlap: Both HolmesGPT and K8sGPT aim to assist with Kubernetes troubleshooting using AI.
Difference: K8sGPT focuses on interpreting Kubernetes resource status and surfacing potential misconfigurations, whereas HolmesGPT is broader - it can query a wider variety of data sources, start investigations from plaintext (like K8sGPT) or structured data like Prometheus alerts, and can connect with external IT tools such as ServiceNow.
- Kagent
Overlap: Both HolmesGPT and Kagent aim to enable intelligent agents in Kubernetes environments using LLMs.
Difference: Kagent is focused on providing a general-purpose framework for building, hosting, and orchestrating agentic workflows in Kubernetes and that interact with cloud-native tools. It includes example agents with some overlap with HolmesGPT. By contrast, HolmesGPT is an opinionated, production-ready agent focused on root-cause-analysis of cloud-native issues. This leads to a distinct feature set and roadmap. (E.g. HolmesGPT’s HTTP API has the capability to render rich-output formats with Prometheus graphs embedded in them, assuming the client knows how to parse them. Furthermore, there are future plans to incorporate traditional machine learning and AIOps algorithms to reduce log-volume and allow fitting all logs from the cluster in the past hour into a large language model’s context window for anomaly detection.) HolmesGPT could use Kagent as a runtime or execution framework in the future, but the project’s focus will always be a specialized solution with built-in integrations and operational logic rather than an open-ended agent framework.
Similar projects
Kagent
K8sGPT
Landscape
No
Business Product or Service to Project separation
HolmesGPT started as an upstream open source project maintained by Robusta.dev that also functions as the backend for Robusta’s AI-powered root cause analysis features. While Robusta.dev was the first to build on top of HolmesGPT, it is not the only one - other vendors are already adopting and integrating HolmesGPT into their own solutions.
We’ve already taken concrete steps to separate HolmesGPT from other Robusta offerings. The open source project is production-ready, self-contained, and designed to be fully usable without any dependency on commercial services.
As part of this process, we are actively working to separate the HolmesGPT documentation from other Robusta projects. Today, the documentation is hosted alongside our broader offering, but we have started moving it to standalone docs to better reflect HolmesGPT’s independent status.
We are also in the process of updating the existing README and other materials to reflect HolmesGPT’s standalone identity.
Project "Domain Technical Review"
We plan to engage soon and will update here.
CNCF contacts
No response
Additional information
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status