-
Notifications
You must be signed in to change notification settings - Fork 684
Description
At present, our logs are somewhat problematic in that they don't provide a lot of value to operators, and they simultaneously don't provide a lot of value to our team. It appears as though the logs have a mix of target audiences: mostly core developers, but secondarily operators.
One option is to use different log levels to output different kinds of information (e.g. operator-focused logs from info level, but developer-focused logs at debug level).
My alternative proposal here is to totally separate the logging/tracing output for each of these audiences:
- Let logs exclusively target operators, providing them with actionable insights into what's going on in the system, especially if there are failures that require operator intervention to recover from.
- Introduce traces - a human- and machine-readable output format that is turned off by default, and provides far more detailed/fine-grained information, when turned on, to:
- Allow core developers to more easily troubleshoot systems under test or even systems in production.
- Potentially help facilitate model-based testing via our E2E tests, since these traces could potentially be machine-readable.
The specific trace format that @josef-widder recommended looking into here is Informal Trace Format (cc @konnov, @shonfeder). Traces could be exposed either by way of special trace JSON files, or via some form of streaming RPC/gRPC endpoint. We already have an ITF parser written in Rust here: https://github.com/informalsystems/itf-rs
If this would potentially be a reasonable solution to help us understand the system better under different conditions, then the first deliverable here would be an ADR describing the solution in more detail, as well as a PoC implementation (perhaps starting with traces for the consensus and/or mempool reactors' operations).