-
Notifications
You must be signed in to change notification settings - Fork 579
docs: add docs for experimental inference gateway deployment #1533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Warning Rate limit exceeded@biswapanda has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 19 minutes and 51 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
WalkthroughNew documentation and Kubernetes resource manifests have been added to the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Gateway
participant HTTPRoute
participant InferencePool
participant Service (EPP)
participant Pod (EPP)
User->>Gateway: HTTP/2 Inference Request (port 9002)
Gateway->>HTTPRoute: Match request path
HTTPRoute->>InferencePool: Route to backend (dynamo-deepseek)
InferencePool->>Service (EPP): Forward request
Service (EPP)->>Pod (EPP): Proxy to EPP container
Pod (EPP)->>Service (EPP): Model inference response
Service (EPP)->>InferencePool: Return response
InferencePool->>Gateway: Send response
Gateway->>User: Return inference result
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (8)
deploy/inference-gateway/example/inference-gateway-resources.yaml (2)
58-66
: Avoid using the default ServiceAccount for RBAC.
Binding thedefault
SA cluster‐wide is overly permissive.Define a dedicated ServiceAccount and update the ClusterRoleBinding & Deployment:
+apiVersion: v1 +kind: ServiceAccount +metadata: + name: dynamo-deepseek-epp + namespace: default --- kind: ClusterRoleBinding subjects: - - kind: ServiceAccount - name: default - namespace: default + - kind: ServiceAccount + name: dynamo-deepseek-epp + namespace: default roleRef: ... ... spec: containers: + serviceAccountName: dynamo-deepseek-epp
31-38
: Use more specific RBAC resource names.
pod-read
is too generic and may collide with other roles.Consider renaming to
dynamo-deepseek-pod-read
(and updating references accordingly).deploy/inference-gateway/example/README.md (6)
7-8
: Add article for consistency in prerequisites list.- - Kubernetes cluster with kubectl configured + - A Kubernetes cluster with kubectl configured🧰 Tools
🪛 LanguageTool
[grammar] ~7-~7: The singular proper name ‘Kubernetes’ must be used with a third-person or a past tense verb.
Context: ...quests. ## Prerequisites - Kubernetes cluster with kubectl configured - NVIDIA GPU dr...(HE_VERB_AGR)
42-43
: Insert missing comma.- In this example we'll install `kgateway`. + In this example, we'll install `kgateway`.🧰 Tools
🪛 LanguageTool
[typographical] ~42-~42: It appears that a comma is missing.
Context: ...y an inference gateway service. In this example we'll installkgateway
. Install the ...(DURING_THAT_TIME_COMMA)
50-51
: Fix punctuation and unify casing forkgateway
.- Deploy an Inference Gateway. In this example we'll install `Kgateway` + Deploy an Inference Gateway. In this example, we'll install `kgateway`🧰 Tools
🪛 LanguageTool
[typographical] ~50-~50: It appears that a comma is missing.
Context: ... Deploy an Inference Gateway. In this example we'll installKgateway
```bash KGTW_V...(DURING_THAT_TIME_COMMA)
75-75
: Hyphenate compound modifier.- 4. **Apply Dynamo specific manifests** + 4. **Apply Dynamo-specific manifests**🧰 Tools
🪛 LanguageTool
[uncategorized] ~75-~75: When ‘Dynamo-specific’ is used as a modifier, it is usually spelled with a hyphen.
Context: ... True 1m ``` 4. Apply Dynamo specific manifests The Inference Gateway is c...(SPECIFIC_HYPHEN)
118-124
: Normalize heading levels and correct typo.
Heading "4. Usage" should be a markdown heading and fix "Poulate" → "Populate".- 4. Usage + ## Usage @@ - ### 1: Poulate gateway url for your k8s cluster + ### 1. Populate gateway URL for your k8s cluster🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
124-124: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3(MD001, heading-increment)
146-146
: Specify language for fenced code block.- ``` + ```bash🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
146-146: Fenced code blocks should have a language specified
null(MD040, fenced-code-language)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
deploy/inference-gateway/example/README.md
(1 hunks)deploy/inference-gateway/example/inference-gateway-resources.yaml
(1 hunks)
🧰 Additional context used
🪛 LanguageTool
deploy/inference-gateway/example/README.md
[grammar] ~7-~7: The singular proper name ‘Kubernetes’ must be used with a third-person or a past tense verb.
Context: ...quests. ## Prerequisites - Kubernetes cluster with kubectl configured - NVIDIA GPU dr...
(HE_VERB_AGR)
[typographical] ~42-~42: It appears that a comma is missing.
Context: ...y an inference gateway service. In this example we'll install kgateway
. Install the ...
(DURING_THAT_TIME_COMMA)
[typographical] ~50-~50: It appears that a comma is missing.
Context: ... Deploy an Inference Gateway. In this example we'll install Kgateway
```bash KGTW_V...
(DURING_THAT_TIME_COMMA)
[uncategorized] ~75-~75: When ‘Dynamo-specific’ is used as a modifier, it is usually spelled with a hyphen.
Context: ... True 1m ``` 4. Apply Dynamo specific manifests The Inference Gateway is c...
(SPECIFIC_HYPHEN)
[uncategorized] ~129-~129: Possible missing comma found.
Context: ...ateway-URL> To test the gateway in minikube use the following:
bash minikube tun...
(AI_HYDRA_LEO_MISSING_COMMA)
🪛 markdownlint-cli2 (0.17.2)
deploy/inference-gateway/example/README.md
124-124: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3
(MD001, heading-increment)
146-146: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
🪛 Checkov (3.2.334)
deploy/inference-gateway/example/inference-gateway-resources.yaml
[MEDIUM] 119-171: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[MEDIUM] 119-171: Minimize the admission of root containers
(CKV_K8S_23)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/1533/merge) by biswapanda.
deploy/inference-gateway/example/inference-gateway-resources.yaml
[error] 16-30: YAML check failed: expected a single document in the stream but found another document starting at line 30, column 1.
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Build and Test - vllm
deploy/inference-gateway/example/inference-gateway-resources.yaml
Outdated
Show resolved
Hide resolved
deploy/inference-gateway/example/inference-gateway-resources.yaml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (13)
deploy/inference-gateway/example/resources/cluster-role.yaml (2)
19-27
: Consider consolidating duplicate rules for the same apiGroup.
You can merge the twoinference.networking.x-k8s.io
entries into one to reduce redundancy:- - apiGroups: ["inference.networking.x-k8s.io"] - resources: ["inferencepools"] - verbs: ["get", "watch", "list"] - - apiGroups: ["inference.networking.x-k8s.io"] - resources: ["inferencemodels"] - verbs: ["get", "watch", "list"] + - apiGroups: ["inference.networking.x-k8s.io"] + resources: ["inferencepools", "inferencemodels"] + verbs: ["get", "watch", "list"]
40-40
: Add a trailing newline.
A newline at the end of the file will satisfy YAML lint rules.🧰 Tools
🪛 YAMLlint (1.37.1)
[error] 40-40: no new line character at the end of file
(new-line-at-end-of-file)
deploy/inference-gateway/example/resources/service.yaml (1)
28-28
: Add a trailing newline.
Append a newline to satisfy YAML lint.🧰 Tools
🪛 YAMLlint (1.37.1)
[error] 28-28: no new line character at the end of file
(new-line-at-end-of-file)
deploy/inference-gateway/example/resources/http-router.yaml (1)
34-34
: Add a trailing newline.
Include a final newline to pass YAML linting.🧰 Tools
🪛 YAMLlint (1.37.1)
[error] 34-34: no new line character at the end of file
(new-line-at-end-of-file)
deploy/inference-gateway/example/resources/inference-model.yaml (1)
26-26
: Add a trailing newline.
A final newline satisfies YAML lint requirements.🧰 Tools
🪛 YAMLlint (1.37.1)
[error] 26-26: no new line character at the end of file
(new-line-at-end-of-file)
deploy/inference-gateway/example/resources/inference-pool.yaml (1)
28-28
: Add a trailing newline.
Include a newline at EOF to satisfy the linter.🧰 Tools
🪛 YAMLlint (1.37.1)
[error] 28-28: no new line character at the end of file
(new-line-at-end-of-file)
deploy/inference-gateway/example/resources/dynamo-epp.yaml (1)
34-50
: Enhance container security
To satisfy best practices and address Checkov warnings, add a minimalsecurityContext
to drop privileges and enforce non-root execution:containers: - name: epp + securityContext: + runAsNonRoot: true + allowPrivilegeEscalation: false image: us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/epp:main imagePullPolicy: Always args:deploy/inference-gateway/example/README.md (6)
7-7
: Clarify prerequisite wording
Change to:- A Kubernetes cluster with kubectl installed and configured
for consistency and to specify both installation and configuration steps.
🧰 Tools
🪛 LanguageTool
[grammar] ~7-~7: The singular proper name ‘Kubernetes’ must be used with a third-person or a past tense verb.
Context: ...quests. ## Prerequisites - Kubernetes cluster with kubectl configured - NVIDIA GPU dr...(HE_VERB_AGR)
42-42
: Add missing comma & unify casing
Should read:First, deploy an inference gateway service. In this example, we'll install `kgateway`.
(use lowercase “kgateway” consistently).
🧰 Tools
🪛 LanguageTool
[typographical] ~42-~42: It appears that a comma is missing.
Context: ...y an inference gateway service. In this example we'll installkgateway
. Install the ...(DURING_THAT_TIME_COMMA)
50-50
: Add missing comma & unify casing
Should read:Deploy an Inference Gateway. In this example, we'll install `kgateway`.
[matches prior casing and comma style]
🧰 Tools
🪛 LanguageTool
[typographical] ~50-~50: It appears that a comma is missing.
Context: ... Deploy an Inference Gateway. In this example we'll installKgateway
```bash KGTW_V...(DURING_THAT_TIME_COMMA)
75-75
: Hyphenate compound modifier
Update the heading to:4. **Apply Dynamo-specific manifests**
to correctly hyphenate “Dynamo-specific.”
🧰 Tools
🪛 LanguageTool
[uncategorized] ~75-~75: When ‘Dynamo-specific’ is used as a modifier, it is usually spelled with a hyphen.
Context: ... True 1m ``` 4. Apply Dynamo specific manifests The Inference Gateway is c...(SPECIFIC_HYPHEN)
129-129
: Add missing comma after introductory phrase
Should read:To test the gateway in minikube, use the following:
for grammatical correctness.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~129-~129: Possible missing comma found.
Context: ...ateway-URL>To test the gateway in minikube use the following:
bash minikube tun...(AI_HYDRA_LEO_MISSING_COMMA)
146-146
: Specify code block language
Add a language identifier to the fenced block (e.g., ```bash) to satisfy Markdown linting:- ``` + ```bash🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
146-146: Fenced code blocks should have a language specified
null(MD040, fenced-code-language)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
deploy/inference-gateway/example/README.md
(1 hunks)deploy/inference-gateway/example/resources/cluster-role-binding.yaml
(1 hunks)deploy/inference-gateway/example/resources/cluster-role.yaml
(1 hunks)deploy/inference-gateway/example/resources/dynamo-epp.yaml
(1 hunks)deploy/inference-gateway/example/resources/http-router.yaml
(1 hunks)deploy/inference-gateway/example/resources/inference-model.yaml
(1 hunks)deploy/inference-gateway/example/resources/inference-pool.yaml
(1 hunks)deploy/inference-gateway/example/resources/service.yaml
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- deploy/inference-gateway/example/resources/cluster-role-binding.yaml
🧰 Additional context used
🪛 LanguageTool
deploy/inference-gateway/example/README.md
[grammar] ~7-~7: The singular proper name ‘Kubernetes’ must be used with a third-person or a past tense verb.
Context: ...quests. ## Prerequisites - Kubernetes cluster with kubectl configured - NVIDIA GPU dr...
(HE_VERB_AGR)
[typographical] ~42-~42: It appears that a comma is missing.
Context: ...y an inference gateway service. In this example we'll install kgateway
. Install the ...
(DURING_THAT_TIME_COMMA)
[typographical] ~50-~50: It appears that a comma is missing.
Context: ... Deploy an Inference Gateway. In this example we'll install Kgateway
```bash KGTW_V...
(DURING_THAT_TIME_COMMA)
[uncategorized] ~75-~75: When ‘Dynamo-specific’ is used as a modifier, it is usually spelled with a hyphen.
Context: ... True 1m ``` 4. Apply Dynamo specific manifests The Inference Gateway is c...
(SPECIFIC_HYPHEN)
[uncategorized] ~129-~129: Possible missing comma found.
Context: ...ateway-URL> To test the gateway in minikube use the following:
bash minikube tun...
(AI_HYDRA_LEO_MISSING_COMMA)
🪛 markdownlint-cli2 (0.17.2)
deploy/inference-gateway/example/README.md
124-124: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3
(MD001, heading-increment)
146-146: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
🪛 YAMLlint (1.37.1)
deploy/inference-gateway/example/resources/cluster-role.yaml
[error] 40-40: no new line character at the end of file
(new-line-at-end-of-file)
deploy/inference-gateway/example/resources/http-router.yaml
[error] 34-34: no new line character at the end of file
(new-line-at-end-of-file)
deploy/inference-gateway/example/resources/inference-model.yaml
[error] 26-26: no new line character at the end of file
(new-line-at-end-of-file)
deploy/inference-gateway/example/resources/inference-pool.yaml
[error] 28-28: no new line character at the end of file
(new-line-at-end-of-file)
deploy/inference-gateway/example/resources/service.yaml
[error] 28-28: no new line character at the end of file
(new-line-at-end-of-file)
🪛 Checkov (3.2.334)
deploy/inference-gateway/example/resources/dynamo-epp.yaml
[MEDIUM] 15-67: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[MEDIUM] 15-67: Minimize the admission of root containers
(CKV_K8S_23)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Build and Test - vllm
🔇 Additional comments (8)
deploy/inference-gateway/example/resources/cluster-role.yaml (2)
15-18
: ClusterRole metadata is correctly defined.
TheapiVersion
,kind
, andmetadata.name
(pod-read
) align with expected RBAC conventions.
29-34
: Verify authentication/authorization review permissions.
Ensure thetokenreviews
andsubjectaccessreviews
create verbs match your gateway’s auth flow. If using SubjectAccessReview or TokenReview in the controller, confirm the ServiceAccount has these rights.deploy/inference-gateway/example/resources/service.yaml (1)
21-23
: Ensure Service selector matches the Deployment labels.
The Service selects pods withapp: dynamo-deepseek-epp
—confirm thedynamo-epp.yaml
Deployment pod template uses the same label.deploy/inference-gateway/example/resources/http-router.yaml (1)
20-24
: Explicit namespace in parentRefs if Gateway is non-default.
By default, this HTTPRoute targets a Gateway in its own namespace. Ifinference-gateway
lives elsewhere, addnamespace: <namespace>
underparentRefs
.deploy/inference-gateway/example/resources/inference-model.yaml (1)
15-19
: InferenceModel declaration is valid.
API version, kind, metadata name/namespace, and spec fields look correct for linking to thedynamo-deepseek
pool.deploy/inference-gateway/example/resources/dynamo-epp.yaml (3)
15-22
: Metadata consistency looks good
The Deployment’s name, namespace, and label selectors align correctly, ensuring proper pod targeting.🧰 Tools
🪛 Checkov (3.2.334)
[MEDIUM] 15-67: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[MEDIUM] 15-67: Minimize the admission of root containers
(CKV_K8S_23)
28-33
: Grace period consistency verified
terminationGracePeriodSeconds: 130
appropriately mirrors the longest pod shutdown time in the inference pool.
51-67
: Probes and ports configuration is correct
The gRPC-based liveness/readiness probes on port 9003 and the metrics port are well-defined with sensible delays.🧰 Tools
🪛 Checkov (3.2.334)
[MEDIUM] 15-67: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[MEDIUM] 15-67: Minimize the admission of root containers
(CKV_K8S_23)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, it reads well but needs a pass to fix the headings and subheadings.
Co-authored-by: mohammedabdulwahhab <furkhan324@berkeley.edu> Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
Co-authored-by: mohammedabdulwahhab <furkhan324@berkeley.edu> Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
Co-authored-by: mohammedabdulwahhab <furkhan324@berkeley.edu> Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
Co-authored-by: mohammedabdulwahhab <furkhan324@berkeley.edu> Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
Co-authored-by: mohammedabdulwahhab <furkhan324@berkeley.edu> Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
deploy/inference-gateway/example/README.md (5)
26-26
: Remove trailing whitespace
Pre-commit hook flagged a trailing whitespace on this line. Please runpre-commit run --all-files
or manually remove the extra space to resolve the pipeline failure.
75-75
: Use hyphen in compound modifier
Change “Dynamo specific manifests” to “Dynamo-specific manifests” for correct hyphenation.
124-124
: Fix spelling and casing in header
Correct “Poulate gateway url for your k8s cluster” to “Populate Gateway URL for your Kubernetes cluster”.
129-129
: Add missing comma
Insert a comma after “minikube” for proper punctuation:-To test the gateway in minikube use the following: +To test the gateway in minikube, use the following:
146-146
: Specify code block language
The fenced code block here has no language. Add “bash” for syntax highlighting:-``` +```bash
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
deploy/inference-gateway/example/README.md
(1 hunks)
🧰 Additional context used
🪛 LanguageTool
deploy/inference-gateway/example/README.md
[grammar] ~7-~7: The singular proper name ‘Kubernetes’ must be used with a third-person or a past tense verb.
Context: ...quests. ## Prerequisites - Kubernetes cluster with kubectl configured - NVIDIA GPU dr...
(HE_VERB_AGR)
[typographical] ~42-~42: It appears that a comma is missing.
Context: ...y an inference gateway service. In this example we'll install kgateway
. Install the ...
(DURING_THAT_TIME_COMMA)
[typographical] ~50-~50: It appears that a comma is missing.
Context: ... Deploy an Inference Gateway. In this example we'll install Kgateway
```bash KGTW_V...
(DURING_THAT_TIME_COMMA)
[uncategorized] ~75-~75: When ‘Dynamo-specific’ is used as a modifier, it is usually spelled with a hyphen.
Context: ... True 1m ``` 4. Apply Dynamo specific manifests The Inference Gateway is c...
(SPECIFIC_HYPHEN)
[uncategorized] ~129-~129: Possible missing comma found.
Context: ...ateway-URL> To test the gateway in minikube use the following:
bash minikube tun...
(AI_HYDRA_LEO_MISSING_COMMA)
🪛 markdownlint-cli2 (0.17.2)
deploy/inference-gateway/example/README.md
146-146: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/1533/merge) by biswapanda.
deploy/inference-gateway/example/README.md
[error] 26-26: Trailing whitespace detected and fixed by pre-commit hook 'trailing-whitespace'. Please run 'pre-commit run --all-files' locally to fix.
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Build and Test - vllm
- GitHub Check: Mirror Repository to GitLab
Overview:
Added deployment and usage instructions for Inference Gateway with Dynamo on Kubernetes.
1.demo-Inference-Gateway-POL.mp4
Where should the reviewer start?
README (deploy/inference-gateway/example/README.md) document explains steps to deploy Dynamo with inference gateway.
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
Documentation
New Features