Skip to content

Distribute HTTP/2 calls across all API server instances #8810

@timebertt

Description

@timebertt

How to categorize this issue?

/area control-plane networking scalability
/kind enhancement

Context

Since GEP-08, traffic to Shoot API servers is passed on by the istio ingress gateway to the respective shoot API server by looking at the TLS SNI header (see control plane endpoints). In this architecture, TLS connections are terminated by individual API server instances.
After establishing a TLS connection, all requests sent over this connection end up on the same API server instance.
I.e., the istio ingress gateway performs L4 load balancing only but doesn't distribute individual requests across API server instances.

Usual HTTP/2 implementations (e.g., in Go) use a single TCP/TLS connection as long as MAX_CONCURRENT_STREAMS is not reached (this was basically the promise of HTTP/2 – to reuse a single L4 connection for many concurrent streams). With this, a typical controller based on client-go will send all of its API requests to a single API server instance.
By deactivating HTTP/2 however, client-go will open a pool of TCP/TLS connections instead and distribute API requests across these L4 connections, which realizes a good distribution across API server instances because the istio ingress gateway balances the load on this layer.

Problem Statement

With the current architecture, we can't make use of the HTTP/2 protocol in shoot API requests. In fact, activating HTTP/2 can come with a performance and scalability penalty over HTTP/1.1. However, HTTP/2 is used by default in most clients like client-go.
We can observe an unequal distribution of API requests across API servers, especially in clusters with "big" operators performing most of the API requests. This comes with the potential of overloading individual API server instances while other instances are idling.

As a consequence, the efficiency of autoscaling the API server vertically is reduced, because the resource footprint of instances can differ significantly. VPA works best with equally sized instances.
Also, when API servers are terminated/rolled, individual TLS connections are destroyed. This leads to clients reconnecting to other instances and flooding one of them with requests (especially re-list and watch requests) instead of distributing requests across other healthy instances.

Ideas

This is not a fully-fledged proposal yet. The issue should serve as a starting point for discussing the problem and ideas. Based on the feedback, we could create a full proposal later on.

We could introduce L7 load balancing for shoot API servers, i.e., multiplexing HTTP/2 streams from a single TLS connection to multiple instances.
For this, we would need to terminate TLS connections earlier in the network flow in a proxy before the API server. This proxy could either be the existing istio ingress gateway (global proxy) or an additional proxy per shoot control plane (local proxy). This proxy would open backend connections to all API server instances and multiplex incoming streams over these backend connections.

In addition to presenting the expected server certificate to the client, the proxy would also need to translate L4 authentication (TLS client cert) to L7 authentication information, i.e. put client certificate CN/O into the --requestheader-*-headers headers configured in the API server (similar to envoyproxy/envoy#6601).
Envoy already supports the XFCC header for this, but the API server doesn't understand the XFCC header (kubernetes/kubernetes#78252).
Probably, Envoy can still be configured to pass information in the format expected by the API server using a wasm plugin.

Open Points

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions