Distribute HTTP/2 calls across all API server instances

**How to categorize this issue?**

/area control-plane networking scalability
/kind enhancement

## Context

Since [GEP-08](https://github.com/gardener/gardener/blob/master/docs/proposals/08-shoot-apiserver-via-sni.md), traffic to Shoot API servers is passed on by the istio ingress gateway to the respective shoot API server by looking at the TLS SNI header (see [control plane endpoints](https://github.com/gardener/gardener/blob/master/docs/usage/control-plane-endpoints-and-ports.md#kube-apiserver-via-sni)). In this architecture, TLS connections are terminated by individual API server instances.
After establishing a TLS connection, all requests sent over this connection end up on the same API server instance.
I.e., the istio ingress gateway performs L4 load balancing only but doesn't distribute individual requests across API server instances.

Usual HTTP/2 implementations (e.g., in Go) use a single TCP/TLS connection as long as `MAX_CONCURRENT_STREAMS` is not reached (this was basically the promise of HTTP/2 – to reuse a single L4 connection for many concurrent streams). With this, a typical controller based on client-go will send all of its API requests to a single API server instance.
By deactivating HTTP/2 however, client-go will open a pool of TCP/TLS connections instead and distribute API requests across these L4 connections, which realizes a good distribution across API server instances because the istio ingress gateway balances the load on this layer.

## Problem Statement

With the current architecture, we can't make use of the HTTP/2 protocol in shoot API requests. In fact, activating HTTP/2 can come with a performance and scalability penalty over HTTP/1.1. However, HTTP/2 is used by default in most clients like client-go.
We can observe an unequal distribution of API requests across API servers, especially in clusters with "big" operators performing most of the API requests. This comes with the potential of overloading individual API server instances while other instances are idling.

As a consequence, the efficiency of autoscaling the API server vertically is reduced, because the resource footprint of instances can differ significantly. VPA works best with equally sized instances.
Also, when API servers are terminated/rolled, individual TLS connections are destroyed. This leads to clients reconnecting to other instances and flooding one of them with requests (especially re-list and watch requests) instead of distributing requests across other healthy instances.

## Ideas

_This is not a fully-fledged proposal yet. The issue should serve as a starting point for discussing the problem and ideas. Based on the feedback, we could create a full proposal later on._

We could introduce [L7 load balancing](https://blog.envoyproxy.io/introduction-to-modern-network-load-balancing-and-proxying-a57f6ff80236#76bc) for shoot API servers, i.e., multiplexing HTTP/2 streams from a single TLS connection to multiple instances.
For this, we would need to terminate TLS connections earlier in the network flow in a proxy before the API server. This proxy could either be the existing istio ingress gateway (global proxy) or an additional proxy per shoot control plane (local proxy). This proxy would open backend connections to all API server instances and multiplex incoming streams over these backend connections.

In addition to presenting the expected server certificate to the client, the proxy would also need to translate L4 authentication (TLS client cert) to L7 authentication information, i.e. put client certificate CN/O into the `--requestheader-*-headers` headers [configured in the API server](https://kubernetes.io/docs/reference/access-authn-authz/authentication/#authenticating-proxy) (similar to https://github.com/envoyproxy/envoy/issues/6601).
Envoy already supports the [XFCC header](https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_conn_man/headers#x-forwarded-client-cert) for this, but the API server doesn't understand the XFCC header (https://github.com/kubernetes/kubernetes/issues/78252).
Probably, Envoy can still be configured to pass information in the format expected by the API server using a wasm plugin.

## Open Points

- [x] https://github.com/gardener/gardener/pull/11085
- [x] https://github.com/gardener/gardener/pull/11807
- [x] https://github.com/gardener/gardener/pull/12509
- [x] https://github.com/gardener/gardener/pull/11866
- [x] Garden runtime and unmanaged seed clusters seem to have too many nodes due to istio pods (check if we can improve TSC/affinity rules): https://github.com/gardener/gardener/pull/12007
- [ ] CPU load of `gardener-apiserver` still seems to be "somewhat" unbalanced (needs investigation)
- Gardener Hackathon 2025-06:
  - [x] https://github.com/gardener/gardener/pull/12260

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Distribute HTTP/2 calls across all API server instances #8810

Context

Problem Statement

Ideas

Open Points

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Distribute HTTP/2 calls across all API server instances #8810

Description

Context

Problem Statement

Ideas

Open Points

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions