Skip to content

Conversation

maci0
Copy link
Contributor

@maci0 maci0 commented Jul 3, 2025

This pull request introduces support for Google Kubernetes Engine (GKE) Gateway API implementations, enabling users to leverage GKE's native L7 load balancers for both internal and regional external traffic.

  • GKE Gateway Policies:
    Integrates GCPBackendPolicy and HealthCheckPolicy resources into the Helm chart for the sample application.
    These GKE-specific policies are now conditionally created only when the gateway.gatewayClassName is set to a GKE type (e.g., gke-l7-rilb).

  • Installer Script Update:
    The llmd-installer.sh validation logic has been updated to recognize gke-l7-rilb and gke-l7-regional-external-managed as valid gateway types.

install.log

user@w-mwysocki-mc1on5gh:~/llm-d-deployer/quickstart$ ./llmd-installer.sh -m -j gke-l7-rilb -f examples/base/slim/base-sli
m.yaml 
ℹ️  📂 Setting up script environment...
ℹ️  kubectl can reach to a running Kubernetes cluster.
✅ HF_TOKEN validated
✅ Gateway type validated
ℹ️  🏗️ Installing GAIE Kubernetes infrastructure…
✅ 📜 Base CRDs: Installing...
Warning: resource customresourcedefinitions/gatewayclasses.gateway.networking.k8s.io is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
customresourcedefinition.apiextensions.k8s.io/gatewayclasses.gateway.networking.k8s.io configured
Warning: resource customresourcedefinitions/gateways.gateway.networking.k8s.io is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
customresourcedefinition.apiextensions.k8s.io/gateways.gateway.networking.k8s.io configured
customresourcedefinition.apiextensions.k8s.io/grpcroutes.gateway.networking.k8s.io created
Warning: resource customresourcedefinitions/httproutes.gateway.networking.k8s.io is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
customresourcedefinition.apiextensions.k8s.io/httproutes.gateway.networking.k8s.io configured
Warning: resource customresourcedefinitions/referencegrants.gateway.networking.k8s.io is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
customresourcedefinition.apiextensions.k8s.io/referencegrants.gateway.networking.k8s.io configured
✅ 🚪 GAIE CRDs: Installing...
customresourcedefinition.apiextensions.k8s.io/inferencemodels.inference.networking.x-k8s.io created
customresourcedefinition.apiextensions.k8s.io/inferencepools.inference.networking.x-k8s.io created
✅ 🎒 Gateway provider 'gke-l7-rilb': Installing...
✅ GAIE infra applied
ℹ️  📦 Creating namespace llm-d...
namespace/llm-d created
✅ Namespace ready
ℹ️  🔹 Using merged values: /tmp/tmp.f5Y2i6sRfV
ℹ️  🔐 Creating/updating HF token secret...
secret/llm-d-hf-token created
✅ HF token secret created
ℹ️  Fetching OCP proxy UID...
ℹ️  No OpenShift SCC annotation found; defaulting PROXY_UID=0
ℹ️  📜 Applying modelservice CRD...
customresourcedefinition.apiextensions.k8s.io/modelservices.llm-d.ai unchanged
✅ ModelService CRD applied
ℹ️  ⏭️ Model download to PVC skipped: BYO model via HF repo_id selected.
protocol hf chosen - models will be downloaded JIT in inferencing pods.
"bitnami" already exists with the same configuration, skipping
ℹ️  🛠️ Building Helm chart dependencies...
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "llm-d" chart repository
...Successfully got an update from the "prometheus-community" chart repository
...Successfully got an update from the "bitnami" chart repository
Update Complete. ⎈Happy Helming!⎈
Saving 2 charts
Downloading common from repo https://charts.bitnami.com/bitnami
Downloading redis from repo https://charts.bitnami.com/bitnami
Pulled: registry-1.docker.io/bitnamicharts/redis:20.13.4
Digest: sha256:6a389e13237e8e639ec0d445e785aa246b57bfce711b087033a196a291d5c8d7
Deleting outdated charts
✅ Dependencies built
ℹ️  Metrics collection disabled by user request.
ℹ️  🚚 Deploying llm-d chart with /tmp/tmp.f5Y2i6sRfV...
Release "llm-d" does not exist. Installing it now.
NAME: llm-d
LAST DEPLOYED: Thu Jul  3 08:31:09 2025
NAMESPACE: llm-d
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Thank you for installing llm-d.

Your release is named `llm-d`.

To learn more about the release, try:

```bash
$ helm status llm-d
$ helm get all llm-d

Following presets are available to your users:

Name Description
basic-gpu-preset Basic gpu inference
basic-gpu-with-nixl-preset GPU inference with NIXL P/D KV transfer and cache offloading
basic-gpu-with-nixl-and-redis-lookup-preset GPU inference with NIXL P/D KV transfer, cache offloading and Redis lookup server
basic-sim-preset Basic simulation
✅ llm-d deployed
✅ 🎉 Installation complete.

### Workaround for https://github.com/llm-d/llm-d/pull/123

kubectl -n llm-d patch ModelService qwen-qwen3-0-6b --type='json' -p='[{"op": "add", "path": "/spec/decode/containers/0/env/-", "value": {"name": "PATH", "value": "/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/workspace/vllm/.vllm/bin:/root/.local/bin:/usr/local/ompi/bin"}}, {"op": "add", "path": "/spec/decode/containers/0/env/-", "value": {"name": "LD_LIBRARY_PATH", "value": "/usr/local/nvidia/lib64:/usr/local/cuda/lib64:/usr/local/nixl/lib/x86_64-linux-gnu/:/usr/local/ompi/lib:/usr/lib:/usr/local/lib"}}]'

kubectl -n llm-d patch ModelService qwen-qwen3-0-6b --type='json' -p='[{"op": "add", "path": "/spec/prefill/containers/0/env/-", "value": {"name": "PATH", "value": "/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/workspace/vllm/.vllm/bin:/root/.local/bin:/usr/local/ompi/bin"}}, {"op": "add", "path": "/spec/prefill/containers/0/env/-", "value": {"name": "LD_LIBRARY_PATH", "value": "/usr/local/nvidia/lib64:/usr/local/cuda/lib64:/usr/local/nixl/lib/x86_64-linux-gnu/:/usr/local/ompi/lib:/usr/lib:/usr/local/lib"}}]'


### tests


user@w-mwysocki-mc1on5gh:~/llm-d-deployer/quickstart$ 

kubectl -n llm-d get  HTTProute
kubectl -n llm-d get gcpbackendpolicy
kubectl -n llm-d get healthcheckpolicies
kubectl -n llm-d get gateway

NAME              HOSTNAMES   AGE
qwen-qwen3-0-6b               8m35s
NAME                             AGE
qwen-qwen3-0-6b-backend-policy   8m35s
NAME                                  AGE
qwen-qwen3-0-6b-health-check-policy   8m35s
NAME                      CLASS         ADDRESS        PROGRAMMED   AGE
llm-d-inference-gateway   gke-l7-rilb   10.128.0.179   True         8m35s

### more tests

user@w-mwysocki-mc1on5gh:~/llm-d-deployer/quickstart$ ./test-request.sh 
Namespace: llm-d
Model ID:  none; will be discover from first entry in /v1/models

1 -> Fetching available models from the decode pod at 10.108.2.7…
{"object":"list","data":[{"id":"Qwen/Qwen3-0.6B","object":"model","created":1751532186,"owned_by":"vllm","root":"Qwen/Qwen3-0.6B","parent":null,"max_model_len":40960,"permission":[{"id":"modelperm-b53064d73c694c089558a894f9ab0109","object":"model_permission","created":1751532186,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":false,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}pod "curl-8859" deleted

Discovered model to use: Qwen/Qwen3-0.6B

2 -> Sending a completion request to the decode pod at 10.108.2.7…
{"id":"cmpl-d69f176d871340f9a6d521504931f384","object":"text_completion","created":1751532189,"model":"Qwen/Qwen3-0.6B","choices":[{"index":0,"text":" Can you describe your background and experience?\n\nAs a new graduate, I have a","logprobs":null,"finish_reason":"length","stop_reason":null,"prompt_logprobs":null}],"usage":{"prompt_tokens":4,"total_tokens":20,"completion_tokens":16,"prompt_tokens_details":null},"kv_transfer_params":null}pod "curl-1111" deleted

3 -> Fetching available models via the gateway at 10.128.0.179…
{"object":"list","data":[{"id":"Qwen/Qwen3-0.6B","object":"model","created":1751532192,"owned_by":"vllm","root":"Qwen/Qwen3-0.6B","parent":null,"max_model_len":40960,"permission":[{"id":"modelperm-466ceb04120345eeb7fbfc86304e189e","object":"model_permission","created":1751532192,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":false,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}pod "curl-4663" deleted


4 -> Sending a completion request via the gateway at 10.128.0.179 with model 'Qwen/Qwen3-0.6B'…
{"choices":[{"finish_reason":"length","index":0,"logprobs":null,"prompt_logprobs":null,"stop_reason":null,"text":" What do you do? What do you study? What is your goal?\n\nWhat"}],"created":1751532195,"id":"cmpl-144bb91750cc40d58efc6ffb173902ec","kv_transfer_params":null,"model":"Qwen/Qwen3-0.6B","object":"text_completion","usage":{"completion_tokens":16,"prompt_tokens":4,"prompt_tokens_details":null,"total_tokens":20}}pod "curl-4475" deleted

All tests complete.

@achandrasekar
Copy link

Thanks for adding this @maci0! Looks good to me overall.

@nerdalert Please review when you get a chance.

cc @kfswain as well.

@nerdalert
Copy link
Member

nerdalert commented Jul 7, 2025

@maci0 awesome, ty for this and ty for the review @achandrasekar

Can you run pre-commit run -a and bump the chart to 1.0.21 in:

@maci0
Copy link
Contributor Author

maci0 commented Jul 7, 2025

@nerdalert done :)

Copy link
Member

@nerdalert nerdalert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ty!

@nerdalert nerdalert merged commit e721ca3 into llm-d:main Jul 7, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants