Skip to content

Conversation

chewong
Copy link
Collaborator

@chewong chewong commented Aug 7, 2025

Reason for Change:

Upgrade vllm and transformers to the latest stable versions as a prerequisite to #1354.

  • Transformers 4.55.0 conflicts with vLLM 0.9.0 so we are forced to upgrade to 0.10.0. We have to open a follow-up PR to upgrade vLLM to 0.10.1 anyway, which would start supporting models in Add gpt-oss-20b and gpt-oss-120b as KAITO Presets #1354.
  • Added "max_new_tokens": None in presets/workspace/inference/text-generation/tests/test_inference_api.py generate_kwargs since max_new_tokens (default 200) takes precedence over max_length, causing the test to generate more tokens than it supposed to.

TODO after merging this PR:

  • Deprecate T4 skus which is not supported with vLLM V1

TODO when upgrading vLLM to 0.11.0 (v0 engine will be completely removed)

  • Deprecate and remove phi-2, which is not supported with vLLM V1

Requirements

  • added unit tests and e2e tests (if applicable).

Issue Fixed:

Notes for Reviewers:

@chewong
Copy link
Collaborator Author

chewong commented Aug 7, 2025

Got flagged by https://github.com/kaito-project/kaito/actions/runs/16813076971/job/47623004035?pr=1373 for using torch==2.7.1 due to GHSA-887c-mr87-cxwp, but I can't upgrade torch to 2.8.0 until vLLM also upgrades it.

@chewong
Copy link
Collaborator Author

chewong commented Aug 7, 2025

Hold until we release 0.6.0

chewong added 2 commits August 8, 2025 09:38
Signed-off-by: Ernest Wong <chwong719@gmail.com>
Signed-off-by: Ernest Wong <chwong719@gmail.com>
@chewong
Copy link
Collaborator Author

chewong commented Aug 8, 2025

Closing in favor of #1378

@chewong chewong closed this Aug 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

1 participant