Add vLLM Mixtral and TRT-LLM qnemo export tests (plus a couple of bugfixes) #13697

janekl · 2025-05-22T14:01:48Z

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

Adding L2_NeMo_2_vLLM_Export_Mixtral and L2_NeMo_2_Export_Qnemo_TRT_LLM tests
Pinning transformers==4.51.3 as recent newer versions set head_dim=None in a Mixtral HF config (which corrupts subsequent vLLM export)
Removing no longer used nemo/export/vllm/tokenizer_group.py module
Several minor extensions and bugfixes

Collection: LLM

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

github-actions · 2025-05-23T15:55:43Z

[🤖]: Hi @janekl 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

//cc @chtruong814 @ko3n1g @pablo-garay @thomasdhc

…fixes) (NVIDIA#13697) * Pin transformers and comment on Mixtral Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Remove unused tokenizer_group.py as a follow-up to NVIDIA#13498 Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Extend test_hf_import.py for other model classes and configs Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Translate the number of experts for MoE models Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Pass overwrite flag around Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Rename & fix a typo in L2_NeMo_2_VLLM_EXPORT test Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Add vLLM Mixtral export test Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Bugfix for AWQ-like methods Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Add int8_sq qnemo TRTLLM export test Signed-off-by: Jan Lasek <janek.lasek@gmail.com> --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>

…fixes) (NVIDIA#13697) * Pin transformers and comment on Mixtral Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Remove unused tokenizer_group.py as a follow-up to NVIDIA#13498 Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Extend test_hf_import.py for other model classes and configs Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Translate the number of experts for MoE models Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Pass overwrite flag around Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Rename & fix a typo in L2_NeMo_2_VLLM_EXPORT test Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Add vLLM Mixtral export test Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Bugfix for AWQ-like methods Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Add int8_sq qnemo TRTLLM export test Signed-off-by: Jan Lasek <janek.lasek@gmail.com> --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Signed-off-by: jianbinc <shjwudp@gmail.com>

janekl added 9 commits May 22, 2025 10:33

Pin transformers and comment on Mixtral

77063d1

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

Remove unused tokenizer_group.py as a follow-up to #13498

458b9b2

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

Extend test_hf_import.py for other model classes and configs

741304a

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

Translate the number of experts for MoE models

9d214d2

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

Pass overwrite flag around

1bf74be

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

Rename & fix a typo in L2_NeMo_2_VLLM_EXPORT test

150b5a9

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

Add vLLM Mixtral export test

0823008

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

Bugfix for AWQ-like methods

8bc66b8

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

Add int8_sq qnemo TRTLLM export test

60a93b5

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

janekl requested review from pablo-garay, ko3n1g, thomasdhc and chtruong814 as code owners May 22, 2025 14:01

github-actions bot added the CI label May 22, 2025

janekl requested review from oyilmaz-nvidia and Laplasjan107 May 22, 2025 14:02

janekl changed the title ~~Jlasek/vllm and qnemo tests~~ Add vLLM Mixtral and TRT-LLM qnemo export tests (plus a couple of bugfixes) May 22, 2025

oyilmaz-nvidia previously approved these changes May 22, 2025

View reviewed changes

oyilmaz-nvidia enabled auto-merge (squash) May 22, 2025 14:09

Forgot to really add the test

c4ea543

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

janekl dismissed oyilmaz-nvidia’s stale review via c4ea543 May 22, 2025 15:06

janekl added the Run CICD label May 22, 2025

janekl temporarily deployed to test May 22, 2025 15:09 — with GitHub Actions Inactive

github-actions bot removed the Run CICD label May 23, 2025

ko3n1g approved these changes May 26, 2025

View reviewed changes

oyilmaz-nvidia merged commit 65a1a23 into main May 26, 2025
280 of 281 checks passed

oyilmaz-nvidia deleted the jlasek/vllm_and_qnemo_tests branch May 26, 2025 10:55

janekl mentioned this pull request May 27, 2025

ci: Enable e2e tests NVIDIA-NeMo/Export-Deploy#17

Merged

janekl mentioned this pull request May 27, 2025

Mirror recent updates from NVIDIA/NeMo NVIDIA-NeMo/Export-Deploy#23

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add vLLM Mixtral and TRT-LLM qnemo export tests (plus a couple of bugfixes) #13697

Add vLLM Mixtral and TRT-LLM qnemo export tests (plus a couple of bugfixes) #13697

Uh oh!

janekl commented May 22, 2025 •

edited

Loading

Uh oh!

github-actions bot commented May 23, 2025

Uh oh!

Uh oh!

Uh oh!

Add vLLM Mixtral and TRT-LLM qnemo export tests (plus a couple of bugfixes) #13697

Add vLLM Mixtral and TRT-LLM qnemo export tests (plus a couple of bugfixes) #13697

Uh oh!

Conversation

janekl commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

Usage

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

Additional Information

Uh oh!

github-actions bot commented May 23, 2025

Uh oh!

Uh oh!

Uh oh!

janekl commented May 22, 2025 •

edited

Loading