chore: bump base image to 0.0.5 #1364

zhuangqh · 2025-08-06T11:13:05Z

Reason for Change:

bump base image to 0.0.5
- update basic py lib
- add chat template for deepseek series model
migrate preset test to A10

- migrate preset test to A10 Signed-off-by: zhuangqh <zhuangqhc@gmail.com>

kaito-pr-agent · 2025-08-06T11:13:31Z

Title

Update base image and VM sizes for A10 series

Description

Updated base image to version 0.0.5
Changed node VM size to A10 series across multiple configurations
Reduced max-parallel jobs in e2e-preset-test workflow
Added max-model-len parameter in inference-tmpl manifest

Changes walkthrough 📝

Relevant files

Enhancement

e2e-preset-configs.json `Update VM sizes to A10 series` .github/e2e-preset-configs.json Updated `node-vm-size` to `Standard_NV36ads_A10_v5` for various models Updated `node-vm-size` to `Standard_NV72ads_A10_v5` for qwen2.5-coder-7b-instruct	+14/-14
manifest.yaml `Add max-model-len parameter` presets/workspace/test/manifests/inference-tmpl/manifest.yaml Added `max-model-len` parameter with value 2048	+1/-0

Configuration changes

e2e-preset-test.yaml `Reduce parallel job limit` .github/workflows/e2e-preset-test.yaml Reduced `max-parallel` from 10 to 5	+1/-1
supported_models.yaml `Update base image tag` presets/workspace/models/supported_models.yaml Updated `tag` from 0.0.4 to 0.0.5	+2/-1

Need help?
Type /help how to ... in the comments thread for any questions about PR-Agent usage.
Check out the documentation for more information.

kaito-pr-agent · 2025-08-06T11:13:51Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Configuration Update Ensure that the new VM size `Standard_NV36ads_A10_v5` is compatible with the current infrastructure and that it provides the necessary resources for the workloads. "node-vm-size": "Standard_NV36ads_A10_v5", "node-osdisk-size": 100, "OSS": true, "loads_adapter": false, "node_pool": "falcon7b", "runtimes": { "hf": { "command": "accelerate launch --num_processes 1 --num_machines 1 --machine_rank 0 --gpu_ids all /workspace/tfs/inference_api.py --pipeline text-generation --torch_dtype bfloat16", "gpu_count": 1 }, "vllm": { "command": "python3 /workspace/vllm/inference_api.py --kaito-max-probe-steps 6 --dtype float16 --chat-template /workspace/chat_templates/falcon-instruct.jinja", "gpu_count": 1 } } }, { "name": "falcon-7b-instruct", "node-count": 1, "node-vm-size": "Standard_NV36ads_A10_v5", "node-osdisk-size": 100, "OSS": true, "loads_adapter": false, "node_pool": "falcon7binst", "runtimes": { "hf": { "command": "accelerate launch --num_processes 1 --num_machines 1 --machine_rank 0 --gpu_ids all /workspace/tfs/inference_api.py --pipeline text-generation --torch_dtype bfloat16", "gpu_count": 1 }, "vllm": { "command": "python3 /workspace/vllm/inference_api.py --kaito-max-probe-steps 6 --dtype float16 --chat-template /workspace/chat_templates/falcon-instruct.jinja", "gpu_count": 1 } } }, { "name": "falcon-40b", "node-count": 1, "node-vm-size": "Standard_NC48ads_A100_v4", "node-osdisk-size": 400, "OSS": true, "loads_adapter": false, "node_pool": "falcon40b", "runtimes": { "hf": { "command": "accelerate launch --num_processes 1 --num_machines 1 --machine_rank 0 --gpu_ids all /workspace/tfs/inference_api.py --pipeline text-generation --torch_dtype bfloat16", "gpu_count": 2 }, "vllm": { "command": "python3 /workspace/vllm/inference_api.py --dtype bfloat16 --chat-template /workspace/chat_templates/falcon-instruct.jinja --tensor-parallel-size 2", "gpu_count": 2 } } }, { "name": "falcon-40b-instruct", "node-count": 1, "node-vm-size": "Standard_NC48ads_A100_v4", "node-osdisk-size": 400, "OSS": true, "loads_adapter": false, "node_pool": "falcon40bins", "runtimes": { "hf": { "command": "accelerate launch --num_processes 1 --num_machines 1 --machine_rank 0 --gpu_ids all /workspace/tfs/inference_api.py --pipeline text-generation --torch_dtype bfloat16", "gpu_count": 2 }, "vllm": { "command": "python3 /workspace/vllm/inference_api.py --dtype bfloat16 --chat-template /workspace/chat_templates/falcon-instruct.jinja --tensor-parallel-size 2", "gpu_count": 2 } } }, { "name": "mistral-7b", "node-count": 1, "node-vm-size": "Standard_NV36ads_A10_v5", "node-osdisk-size": 100, "OSS": true, "loads_adapter": false, "node_pool": "mistral7b", "runtimes": { "hf": { "command": "accelerate launch --num_processes 1 --num_machines 1 --machine_rank 0 --gpu_ids all /workspace/tfs/inference_api.py --pipeline text-generation --torch_dtype bfloat16", "gpu_count": 1 }, "vllm": { "command": "python3 /workspace/vllm/inference_api.py --kaito-max-probe-steps 6 --dtype float16 --chat-template /workspace/chat_templates/mistral-instruct.jinja", "gpu_count": 1 } } }, { "name": "mistral-7b-instruct", "node-count": 1, "node-vm-size": "Standard_NV36ads_A10_v5", "node-osdisk-size": 100, "OSS": true, "loads_adapter": false, "node_pool": "mistral7bins", "runtimes": { "hf": { "command": "accelerate launch --num_processes 1 --num_machines 1 --machine_rank 0 --gpu_ids all /workspace/tfs/inference_api.py --pipeline text-generation --torch_dtype bfloat16", "gpu_count": 1 }, "vllm": { "command": "python3 /workspace/vllm/inference_api.py --kaito-max-probe-steps 6 --dtype float16", "gpu_count": 1 } } }, { "name": "phi-2", "node-count": 1, "node-vm-size": "Standard_NV36ads_A10_v5", "node-osdisk-size": 50, "OSS": true, "loads_adapter": false, "node_pool": "phi2", "runtimes": { "hf": { "command": "accelerate launch --num_processes 1 --num_machines 1 --machine_rank 0 --gpu_ids all /workspace/tfs/inference_api.py --pipeline text-generation --torch_dtype bfloat16", "gpu_count": 1 }, "vllm": { "command": "python3 /workspace/vllm/inference_api.py --kaito-max-probe-steps 6 --dtype float16", "gpu_count": 1 } } }, { "name": "phi-3-mini-4k-instruct", "node-count": 1, "node-vm-size": "Standard_NV36ads_A10_v5", "node-osdisk-size": 50, "OSS": true, "loads_adapter": false, "node_pool": "phi3mini4kin", "runtimes": { "hf": { Parallel Execution Verify that reducing `max-parallel` from 10 to 5 does not negatively impact the test execution time or resource utilization. max-parallel: 5 Version Update Confirm that the version bump to `0.0.5` includes all necessary changes and that the chat-template updates do not break existing functionalities. tag: 0.0.5 # Tag history: # 0.0.5 - update chat-template for new models # 0.0.4 - bump vLLM to 0.9.0. # 0.0.3 - add depdenencies to support vLLM multi-node distributed inference # 0.0.2 - bump vLLM to 0.8.5. Tool chat template added. # 0.0.1 - Initial Release

zhuangqh · 2025-08-06T11:13:56Z

This pr need to read secret from repo to push image. Thus, I need to push it to the main repo instead of forking. @chewong

kaito-pr-agent · 2025-08-06T11:14:38Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
General	Validate VM size compatibility Verify that `Standard_NV72ads_A10_v5` is compatible with the workload requirements and cost considerations. .github/e2e-preset-configs.json [159] +"node-vm-size": "Standard_NV72ads_A10_v5", - Suggestion importance[1-10]: 6 __ Why: The suggestion asks to verify the compatibility of the VM size with workload requirements and cost considerations, which is important but not critical. The existing code and improved code are identical, so the score is capped at 6.	Low
	Assess parallel job impact Ensure that reducing `max-parallel` to 5 does not impact the testing throughput and resource utilization. .github/e2e-preset-configs.json [131] +"max-parallel": 5, - Suggestion importance[1-10]: 6 __ Why: The suggestion asks to ensure that reducing `max-parallel` to 5 does not impact testing throughput and resource utilization, which is important but not critical. The existing code and improved code are identical, so the score is capped at 6.	Low
	Verify tag updates Confirm that the new tag `0.0.5` includes all necessary updates and is backward compatible. presets/workspace/models/supported_models.yaml [6] +tag: 0.0.5 - Suggestion importance[1-10]: 6 __ Why: The suggestion asks to confirm that the new tag `0.0.5` includes all necessary updates and is backward compatible, which is important but not critical. The existing code and improved code are identical, so the score is capped at 6.	Low
	Check model length setting Ensure that setting `max-model-len` to 2048 meets the model's requirements and does not cause truncation issues. presets/workspace/test/manifests/inference-tmpl/manifest.yaml [97] +max-model-len: 2048 - Suggestion importance[1-10]: 6 __ Why: The suggestion asks to ensure that setting `max-model-len` to 2048 meets the model's requirements and does not cause truncation issues, which is important but not critical. The existing code and improved code are identical, so the score is capped at 6.	Low

codecov · 2025-08-06T11:16:44Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

@@           Coverage Diff           @@
##             main    #1364   +/-   ##
=======================================
  Coverage   58.39%   58.39%           
=======================================
  Files          76       76           
  Lines        8720     8720           
=======================================
  Hits         5092     5092           
  Misses       3406     3406           
  Partials      222      222

Components	Coverage Δ
workspace	`46.85% <ø> (ø)`
presets	`87.84% <ø> (ø)`
main	`∅ <ø> (∅)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Signed-off-by: zhuangqh <zhuangqhc@gmail.com>

**Reason for Change**: - bump base image to 0.0.5 - update basic py lib - add chat template for deepseek series model - migrate preset test to A10 --------- Signed-off-by: zhuangqh <zhuangqhc@gmail.com> Co-authored-by: Fei Guo <vrgf2003@gmail.com>

chore: bump base image to 0.0.5

af63a9b

- migrate preset test to A10 Signed-off-by: zhuangqh <zhuangqhc@gmail.com>

zhuangqh requested review from Fei-Guo and chewong as code owners August 6, 2025 11:13

github-project-automation bot added this to KAITO Roadmap Aug 6, 2025

zhuangqh had a problem deploying to e2e-test August 6, 2025 11:13 — with GitHub Actions Error

zhuangqh had a problem deploying to preset-env August 6, 2025 11:13 — with GitHub Actions Error

zhuangqh had a problem deploying to e2e-test August 6, 2025 11:13 — with GitHub Actions Error

zhuangqh had a problem deploying to unit-tests August 6, 2025 11:13 — with GitHub Actions Failure

zhuangqh temporarily deployed to unit-tests August 6, 2025 11:13 — with GitHub Actions Inactive

zhuangqh had a problem deploying to preset-env August 6, 2025 11:13 — with GitHub Actions Failure

kaito-pr-agent bot added the Review effort 3/5 label Aug 6, 2025

zhuangqh had a problem deploying to preset-env August 6, 2025 11:40 — with GitHub Actions Failure

zhuangqh had a problem deploying to preset-env August 6, 2025 11:42 — with GitHub Actions Failure

zhuangqh had a problem deploying to preset-env August 6, 2025 11:45 — with GitHub Actions Failure

update

aefe653

Signed-off-by: zhuangqh <zhuangqhc@gmail.com>

zhuangqh temporarily deployed to unit-tests August 6, 2025 11:51 — with GitHub Actions Inactive

zhuangqh had a problem deploying to unit-tests August 6, 2025 11:51 — with GitHub Actions Failure

zhuangqh had a problem deploying to e2e-test August 6, 2025 11:51 — with GitHub Actions Error

zhuangqh had a problem deploying to preset-env August 6, 2025 11:51 — with GitHub Actions Error

zhuangqh had a problem deploying to e2e-test August 6, 2025 11:51 — with GitHub Actions Error

zhuangqh temporarily deployed to preset-env August 6, 2025 11:51 — with GitHub Actions Inactive

zhuangqh temporarily deployed to preset-env August 6, 2025 12:02 — with GitHub Actions Inactive

update

84468fa

Signed-off-by: zhuangqh <zhuangqhc@gmail.com>

zhuangqh temporarily deployed to unit-tests August 7, 2025 05:21 — with GitHub Actions Inactive

zhuangqh had a problem deploying to unit-tests August 7, 2025 05:21 — with GitHub Actions Error

zhuangqh temporarily deployed to preset-env August 7, 2025 05:21 — with GitHub Actions Inactive

zhuangqh had a problem deploying to e2e-test August 7, 2025 05:21 — with GitHub Actions Error

zhuangqh had a problem deploying to preset-env August 7, 2025 05:21 — with GitHub Actions Error

zhuangqh had a problem deploying to preset-env August 7, 2025 05:22 — with GitHub Actions Failure

fix

815e4c4

Signed-off-by: zhuangqh <zhuangqhc@gmail.com>

zhuangqh temporarily deployed to preset-env August 7, 2025 05:31 — with GitHub Actions Inactive

zhuangqh temporarily deployed to unit-tests August 7, 2025 05:31 — with GitHub Actions Inactive

zhuangqh had a problem deploying to e2e-test August 7, 2025 05:31 — with GitHub Actions Error

zhuangqh had a problem deploying to preset-env August 7, 2025 05:32 — with GitHub Actions Error

zhuangqh temporarily deployed to unit-tests August 7, 2025 05:32 — with GitHub Actions Inactive

zhuangqh temporarily deployed to preset-env August 7, 2025 05:32 — with GitHub Actions Inactive

Fei-Guo approved these changes Aug 7, 2025

View reviewed changes

Merge branch 'main' into bump-img-005

87fdf39

Fei-Guo temporarily deployed to unit-tests August 7, 2025 05:47 — with GitHub Actions Inactive

Fei-Guo requested a deployment to preset-env August 7, 2025 05:47 — with GitHub Actions Waiting

Fei-Guo temporarily deployed to unit-tests August 7, 2025 05:47 — with GitHub Actions Inactive

Fei-Guo requested a deployment to e2e-test August 7, 2025 05:47 — with GitHub Actions Waiting

Fei-Guo requested a deployment to preset-env August 7, 2025 05:47 — with GitHub Actions Waiting

zhuangqh merged commit 4ad29f8 into main Aug 7, 2025
18 of 23 checks passed

zhuangqh deleted the bump-img-005 branch August 7, 2025 05:51

github-project-automation bot moved this to Done in KAITO Roadmap Aug 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: bump base image to 0.0.5 #1364

chore: bump base image to 0.0.5 #1364

Uh oh!

zhuangqh commented Aug 6, 2025 •

edited

Loading

Uh oh!

kaito-pr-agent bot commented Aug 6, 2025

Uh oh!

kaito-pr-agent bot commented Aug 6, 2025

Uh oh!

zhuangqh commented Aug 6, 2025

Uh oh!

kaito-pr-agent bot commented Aug 6, 2025

Uh oh!

codecov bot commented Aug 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

chore: bump base image to 0.0.5 #1364

chore: bump base image to 0.0.5 #1364

Uh oh!

Conversation

zhuangqh commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kaito-pr-agent bot commented Aug 6, 2025

Title

Description

Changes walkthrough 📝

Uh oh!

kaito-pr-agent bot commented Aug 6, 2025

PR Reviewer Guide 🔍

Uh oh!

zhuangqh commented Aug 6, 2025

Uh oh!

kaito-pr-agent bot commented Aug 6, 2025

PR Code Suggestions ✨

Uh oh!

codecov bot commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

zhuangqh commented Aug 6, 2025 •

edited

Loading

codecov bot commented Aug 6, 2025 •

edited

Loading