Skip to content

Conversation

xiaobochen123
Copy link
Contributor

Motivation

fix moe ep bug, when load fp8 model. Links to related issues link

Test model : neuralmagic/DeepSeek-Coder-V2-Instruct-FP8

Accuracy: 0.932
Invalid: 0.000
Latency: 243.824 s
Output throughput: 1027.530 token/s

cc: @ispobock

Modifications

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

@zhyncs zhyncs changed the title fix moe-ep bug fix moe-ep accuracy issue for fp8 Dec 16, 2024
@zhyncs zhyncs merged commit b532a5f into sgl-project:main Dec 16, 2024
15 checks passed
timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants