Skip to content

Add vLLM Mixtral and TRT-LLM qnemo export tests (plus a couple of bugfixes) #13697

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 26, 2025

Conversation

janekl
Copy link
Collaborator

@janekl janekl commented May 22, 2025

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

  • Adding L2_NeMo_2_vLLM_Export_Mixtral and L2_NeMo_2_Export_Qnemo_TRT_LLM tests
  • Pinning transformers==4.51.3 as recent newer versions set head_dim=None in a Mixtral HF config (which corrupts subsequent vLLM export)
  • Removing no longer used nemo/export/vllm/tokenizer_group.py module
  • Several minor extensions and bugfixes

Collection: LLM

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

janekl added 9 commits May 22, 2025 10:33
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
@github-actions github-actions bot added the CI label May 22, 2025
@janekl janekl changed the title Jlasek/vllm and qnemo tests Add vLLM Mixtral and TRT-LLM qnemo export tests (plus a couple of bugfixes) May 22, 2025
oyilmaz-nvidia
oyilmaz-nvidia previously approved these changes May 22, 2025
@oyilmaz-nvidia oyilmaz-nvidia enabled auto-merge (squash) May 22, 2025 14:09
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Copy link
Contributor

[🤖]: Hi @janekl 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

//cc @chtruong814 @ko3n1g @pablo-garay @thomasdhc

@oyilmaz-nvidia oyilmaz-nvidia merged commit 65a1a23 into main May 26, 2025
280 of 281 checks passed
@oyilmaz-nvidia oyilmaz-nvidia deleted the jlasek/vllm_and_qnemo_tests branch May 26, 2025 10:55
melllinia pushed a commit to melllinia/NeMo that referenced this pull request May 27, 2025
…fixes) (NVIDIA#13697)

* Pin transformers and comment on Mixtral

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Remove unused tokenizer_group.py as a follow-up to NVIDIA#13498

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Extend test_hf_import.py for other model classes and configs

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Translate the number of experts for MoE models

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Pass overwrite flag around

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Rename & fix a typo in L2_NeMo_2_VLLM_EXPORT test

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Add vLLM Mixtral export test

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Bugfix for AWQ-like methods

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Add int8_sq qnemo TRTLLM export test

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

---------

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>
shjwudp pushed a commit to shjwudp/NeMo that referenced this pull request May 31, 2025
…fixes) (NVIDIA#13697)

* Pin transformers and comment on Mixtral

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Remove unused tokenizer_group.py as a follow-up to NVIDIA#13498

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Extend test_hf_import.py for other model classes and configs

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Translate the number of experts for MoE models

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Pass overwrite flag around

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Rename & fix a typo in L2_NeMo_2_VLLM_EXPORT test

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Add vLLM Mixtral export test

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Bugfix for AWQ-like methods

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Add int8_sq qnemo TRTLLM export test

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

---------

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: jianbinc <shjwudp@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants