[DRAFT Proposal] bcc / tools repo evolution

__Status: Brainstorming, no decisions made. Community input very welcome.__

Summarizing a discussion between @yonghong-song , @brendangregg and I. A few observations about the current state of things were made:

1) The `bcc` repo contains _bcc the framework_, _libbpf tools_, and _bcc tools_. Folks who come to the repo to use the tools may think that _bcc the framework_ is a recommended way to write BPF programs in 2022. 

2) bcc and libbpf-tools are maintained by folks who aren't subject matter experts, nor power users of most tools. As a result we don't do a good job keeping tools up-to-date with changes in the subsystems they're tracing. Furthermore, we tend to accept contributions which are adding functionality without much pushback, which will result in the tools becoming a messy 'dumping ground' in the long term.

3) It's nice to have a central repository of tools for a few reasons:
   * discoverability of "tools that dig into X" for folks just beginning to dive into BPF observability
   * easy to find "practical use of BPF feature Y" for folks writing their own programs
   * for core BPF developers, provides a corpus of real-world BPF programs to analyze (identify common patterns, see how proposed changes will affect tools, etc.) 

---

To address (1) and (2) without breaking (3), a proposal:

To keep nice properties of (3), let's keep bcc and libbpf tools in `iovisor/bcc` repo. To clarify (1), let's move _bcc the framework_ into a separate repo (`iovisor/bcc-framework` or similar). Let's also make it clear that `bcc-framework` should not be used for new prog development unless the program writer has a good reason, and encourage the program writer to reach out to us if they do have such a reason, so we can improve `libbpf` ecosystem.

For (2), let's adopt a _toolkit_ vs _toolshed_ distinction for tools. Tools in the _toolkit_ will be actively maintained, ideally by an opinionated power user of the tool or someone familiar with the kernel bits the tool is tracing. Users of such tools can expect the tool to work on a reasonable variety of kernels and output meaningful data usable for production analysis.

The _toolshed_, on the other hand, will contain tools which are not actively maintained and thus _may not work at all_ or, if they do work, _may not output correct or meaningful data_, as kernel implementations of whatever they're tracing may have shifted over the years. These are not to be considered prod-ready, but can make their way into the _toolkit_ if someone finds them useful enough to polish up and maintain.

_Toolkit_ vs _toolshed_ distinction is inspired by Brendan's experience with DTraceToolkit \[0\].

Thoughts? Comments? Brendan, Yonghong, please feel free to correct any of this if I'm misrepresenting our convo.

  \[0\]: https://www.brendangregg.com/blog/2013-09-05/dtracetoolkit-0xx-mistakes.html . Specifically "Mistake 2. Too Many Scripts"

---

Edit history:
* `s/toolbox/toolkit` to match Brendan's blog post. Add link to Brendan's blog post.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DRAFT Proposal] bcc / tools repo evolution #3976

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[DRAFT Proposal] bcc / tools repo evolution #3976

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions