-
Notifications
You must be signed in to change notification settings - Fork 8.1k
Description
The current experimental implementation of ztunnel suffers from a number of issues we'd like to address in the long term. A brief summary:
- xds efficiency: The xDS configuration of envoy relies on copying the entire config pipeline per workload. We expect this to make it unscalable for large clusters. This also makes updates to the xds more expensive than it should be.
- Multi-tenancy. At time of writing, we don't have good measurements for how well handles mutli-tenancy issues (like noisy neighbor) in the ztunnel case.
- Again more measurements are needed, but due to the xds efficiency issues mentioned above, we expect the RAM/CPU/Latency costs of ztunnel to be higher than we'd like.
To address these issues there's two paths forward we need to explore
Evolving Envoy to better support the ztunnel
Envoy needs additional primitives that allow us to express the ztunnel more efficiently. Things like a many-to-many HBONE transport socket, more efficient ways to set/retrieve metadata from HBONE headers without internal listeners, and better multi-tenancy properties.
Build a new L4 proxy
Depending on how heavy a lift the envoy changes we need are, it could be that we're better off just building a new purpose-built proxy. We'd likely do this in Rust, given its memory safety properties, and the existing high quality proxy implementations written in the language (linkerd).
Path Forward
Overall I expect work on this task to proceed in the following steps:
- Gather requirements for the ztunnel
- Measure the current state of ztunnel against those requirements.
- Develop an engineering plan for Envoy ztunnel implementation, and estimate engineering effort required
- Develop an engineering plan for Rust implementation, and estimate engineering effort required.
- Build one/both
Affected product area (please put an X in all that apply)*
[x] Ambient
[ ] Docs
[ ] Installation
[ ] Networking
[ ] Performance and Scalability
[ ] Extensions and Telemetry
[ ] Security
[ ] Test and Release
[ ] User Experience
[ ] Developer Infrastructure
Affected features (please put an X in all that apply)
[ ] Multi Cluster
[ ] Virtual Machine
[ ] Multi Control Plane
Additional context
Metadata
Metadata
Assignees
Labels
Type
Projects
Status