Add support for GKE Gateway API #359

maci0 · 2025-07-03T08:48:54Z

This pull request introduces support for Google Kubernetes Engine (GKE) Gateway API implementations, enabling users to leverage GKE's native L7 load balancers for both internal and regional external traffic.

GKE Gateway Policies:
Integrates GCPBackendPolicy and HealthCheckPolicy resources into the Helm chart for the sample application.
These GKE-specific policies are now conditionally created only when the gateway.gatewayClassName is set to a GKE type (e.g., gke-l7-rilb).
Installer Script Update:
The llmd-installer.sh validation logic has been updated to recognize gke-l7-rilb and gke-l7-regional-external-managed as valid gateway types.

install.log

user@w-mwysocki-mc1on5gh:~/llm-d-deployer/quickstart$ ./llmd-installer.sh -m -j gke-l7-rilb -f examples/base/slim/base-sli
m.yaml 
ℹ️  📂 Setting up script environment...
ℹ️  kubectl can reach to a running Kubernetes cluster.
✅ HF_TOKEN validated
✅ Gateway type validated
ℹ️  🏗️ Installing GAIE Kubernetes infrastructure…
✅ 📜 Base CRDs: Installing...
Warning: resource customresourcedefinitions/gatewayclasses.gateway.networking.k8s.io is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
customresourcedefinition.apiextensions.k8s.io/gatewayclasses.gateway.networking.k8s.io configured
Warning: resource customresourcedefinitions/gateways.gateway.networking.k8s.io is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
customresourcedefinition.apiextensions.k8s.io/gateways.gateway.networking.k8s.io configured
customresourcedefinition.apiextensions.k8s.io/grpcroutes.gateway.networking.k8s.io created
Warning: resource customresourcedefinitions/httproutes.gateway.networking.k8s.io is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
customresourcedefinition.apiextensions.k8s.io/httproutes.gateway.networking.k8s.io configured
Warning: resource customresourcedefinitions/referencegrants.gateway.networking.k8s.io is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
customresourcedefinition.apiextensions.k8s.io/referencegrants.gateway.networking.k8s.io configured
✅ 🚪 GAIE CRDs: Installing...
customresourcedefinition.apiextensions.k8s.io/inferencemodels.inference.networking.x-k8s.io created
customresourcedefinition.apiextensions.k8s.io/inferencepools.inference.networking.x-k8s.io created
✅ 🎒 Gateway provider 'gke-l7-rilb': Installing...
✅ GAIE infra applied
ℹ️  📦 Creating namespace llm-d...
namespace/llm-d created
✅ Namespace ready
ℹ️  🔹 Using merged values: /tmp/tmp.f5Y2i6sRfV
ℹ️  🔐 Creating/updating HF token secret...
secret/llm-d-hf-token created
✅ HF token secret created
ℹ️  Fetching OCP proxy UID...
ℹ️  No OpenShift SCC annotation found; defaulting PROXY_UID=0
ℹ️  📜 Applying modelservice CRD...
customresourcedefinition.apiextensions.k8s.io/modelservices.llm-d.ai unchanged
✅ ModelService CRD applied
ℹ️  ⏭️ Model download to PVC skipped: BYO model via HF repo_id selected.
protocol hf chosen - models will be downloaded JIT in inferencing pods.
"bitnami" already exists with the same configuration, skipping
ℹ️  🛠️ Building Helm chart dependencies...
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "llm-d" chart repository
...Successfully got an update from the "prometheus-community" chart repository
...Successfully got an update from the "bitnami" chart repository
Update Complete. ⎈Happy Helming!⎈
Saving 2 charts
Downloading common from repo https://charts.bitnami.com/bitnami
Downloading redis from repo https://charts.bitnami.com/bitnami
Pulled: registry-1.docker.io/bitnamicharts/redis:20.13.4
Digest: sha256:6a389e13237e8e639ec0d445e785aa246b57bfce711b087033a196a291d5c8d7
Deleting outdated charts
✅ Dependencies built
ℹ️  Metrics collection disabled by user request.
ℹ️  🚚 Deploying llm-d chart with /tmp/tmp.f5Y2i6sRfV...
Release "llm-d" does not exist. Installing it now.
NAME: llm-d
LAST DEPLOYED: Thu Jul  3 08:31:09 2025
NAMESPACE: llm-d
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Thank you for installing llm-d.

Your release is named `llm-d`.

To learn more about the release, try:

```bash
$ helm status llm-d
$ helm get all llm-d

Following presets are available to your users:

Name	Description
basic-gpu-preset	Basic gpu inference
basic-gpu-with-nixl-preset	GPU inference with NIXL P/D KV transfer and cache offloading
basic-gpu-with-nixl-and-redis-lookup-preset	GPU inference with NIXL P/D KV transfer, cache offloading and Redis lookup server
basic-sim-preset	Basic simulation
✅ llm-d deployed
✅ 🎉 Installation complete.


### Workaround for https://github.com/llm-d/llm-d/pull/123

kubectl -n llm-d patch ModelService qwen-qwen3-0-6b --type='json' -p='[{"op": "add", "path": "/spec/decode/containers/0/env/-", "value": {"name": "PATH", "value": "/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/workspace/vllm/.vllm/bin:/root/.local/bin:/usr/local/ompi/bin"}}, {"op": "add", "path": "/spec/decode/containers/0/env/-", "value": {"name": "LD_LIBRARY_PATH", "value": "/usr/local/nvidia/lib64:/usr/local/cuda/lib64:/usr/local/nixl/lib/x86_64-linux-gnu/:/usr/local/ompi/lib:/usr/lib:/usr/local/lib"}}]'

kubectl -n llm-d patch ModelService qwen-qwen3-0-6b --type='json' -p='[{"op": "add", "path": "/spec/prefill/containers/0/env/-", "value": {"name": "PATH", "value": "/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/workspace/vllm/.vllm/bin:/root/.local/bin:/usr/local/ompi/bin"}}, {"op": "add", "path": "/spec/prefill/containers/0/env/-", "value": {"name": "LD_LIBRARY_PATH", "value": "/usr/local/nvidia/lib64:/usr/local/cuda/lib64:/usr/local/nixl/lib/x86_64-linux-gnu/:/usr/local/ompi/lib:/usr/lib:/usr/local/lib"}}]'


### tests


user@w-mwysocki-mc1on5gh:~/llm-d-deployer/quickstart$ 

kubectl -n llm-d get  HTTProute
kubectl -n llm-d get gcpbackendpolicy
kubectl -n llm-d get healthcheckpolicies
kubectl -n llm-d get gateway

NAME              HOSTNAMES   AGE
qwen-qwen3-0-6b               8m35s
NAME                             AGE
qwen-qwen3-0-6b-backend-policy   8m35s
NAME                                  AGE
qwen-qwen3-0-6b-health-check-policy   8m35s
NAME                      CLASS         ADDRESS        PROGRAMMED   AGE
llm-d-inference-gateway   gke-l7-rilb   10.128.0.179   True         8m35s

### more tests

user@w-mwysocki-mc1on5gh:~/llm-d-deployer/quickstart$ ./test-request.sh 
Namespace: llm-d
Model ID:  none; will be discover from first entry in /v1/models

1 -> Fetching available models from the decode pod at 10.108.2.7…
{"object":"list","data":[{"id":"Qwen/Qwen3-0.6B","object":"model","created":1751532186,"owned_by":"vllm","root":"Qwen/Qwen3-0.6B","parent":null,"max_model_len":40960,"permission":[{"id":"modelperm-b53064d73c694c089558a894f9ab0109","object":"model_permission","created":1751532186,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":false,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}pod "curl-8859" deleted

Discovered model to use: Qwen/Qwen3-0.6B

2 -> Sending a completion request to the decode pod at 10.108.2.7…
{"id":"cmpl-d69f176d871340f9a6d521504931f384","object":"text_completion","created":1751532189,"model":"Qwen/Qwen3-0.6B","choices":[{"index":0,"text":" Can you describe your background and experience?\n\nAs a new graduate, I have a","logprobs":null,"finish_reason":"length","stop_reason":null,"prompt_logprobs":null}],"usage":{"prompt_tokens":4,"total_tokens":20,"completion_tokens":16,"prompt_tokens_details":null},"kv_transfer_params":null}pod "curl-1111" deleted

3 -> Fetching available models via the gateway at 10.128.0.179…
{"object":"list","data":[{"id":"Qwen/Qwen3-0.6B","object":"model","created":1751532192,"owned_by":"vllm","root":"Qwen/Qwen3-0.6B","parent":null,"max_model_len":40960,"permission":[{"id":"modelperm-466ceb04120345eeb7fbfc86304e189e","object":"model_permission","created":1751532192,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":false,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}pod "curl-4663" deleted


4 -> Sending a completion request via the gateway at 10.128.0.179 with model 'Qwen/Qwen3-0.6B'…
{"choices":[{"finish_reason":"length","index":0,"logprobs":null,"prompt_logprobs":null,"stop_reason":null,"text":" What do you do? What do you study? What is your goal?\n\nWhat"}],"created":1751532195,"id":"cmpl-144bb91750cc40d58efc6ffb173902ec","kv_transfer_params":null,"model":"Qwen/Qwen3-0.6B","object":"text_completion","usage":{"completion_tokens":16,"prompt_tokens":4,"prompt_tokens_details":null,"total_tokens":20}}pod "curl-4475" deleted

All tests complete.

achandrasekar · 2025-07-07T04:30:54Z

Thanks for adding this @maci0! Looks good to me overall.

@nerdalert Please review when you get a chance.

cc @kfswain as well.

nerdalert · 2025-07-07T05:21:11Z

@maci0 awesome, ty for this and ty for the review @achandrasekar

Can you run pre-commit run -a and bump the chart to 1.0.21 in:

llm-d-deployer/charts/llm-d/Chart.yaml

Line 4 in 903746b

version: 1.0.20

maci0 · 2025-07-07T08:44:20Z

@nerdalert done :)

nerdalert

LGTM ty!

Add GKE Support

87846b9

bump version. ran pre-commit run -a

8e732ab

nerdalert approved these changes Jul 7, 2025

View reviewed changes

nerdalert merged commit e721ca3 into llm-d:main Jul 7, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for GKE Gateway API #359

Add support for GKE Gateway API #359

Uh oh!

maci0 commented Jul 3, 2025 •

edited

Loading

Uh oh!

achandrasekar commented Jul 7, 2025

Uh oh!

nerdalert commented Jul 7, 2025 •

edited

Loading

Uh oh!

maci0 commented Jul 7, 2025

Uh oh!

nerdalert left a comment

Uh oh!

Uh oh!

Uh oh!

Add support for GKE Gateway API #359

Add support for GKE Gateway API #359

Uh oh!

Conversation

maci0 commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

install.log

Uh oh!

achandrasekar commented Jul 7, 2025

Uh oh!

nerdalert commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maci0 commented Jul 7, 2025

Uh oh!

nerdalert left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

maci0 commented Jul 3, 2025 •

edited

Loading

nerdalert commented Jul 7, 2025 •

edited

Loading