Skip to content

Conversation

JamesSand
Copy link
Contributor

Motivation

According to this comment in issue #2219, some quantization arguments may not run properly. So this PR adds a unit test for checking each quantization argument can run properly.

Modifications

This PR adds a new unit test file test/srt/test_srt_engine_with_quant_args.py. The unit test contains two parts:

  1. Test --quantization argument. Currently it only tests fp8. This is because other methods are currently depend on vllm. We can add other methods back to test after vllm dependency is resolved.
  2. Test --torchao-config argument. Currently it doesn't test int8dq. This is because because currently there is conflict between int8dq and capture cuda graph, as mentioned in Motivation Section.

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

@merrymercy merrymercy merged commit a74d194 into sgl-project:main Dec 26, 2024
14 checks passed
@JamesSand JamesSand deleted the unittest branch December 27, 2024 05:40
timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants