Skip to content

Question for EP parameter in command for MoE model #3704

@tanzelin430

Description

@tanzelin430

In PR #2203, I noticed that the deployment machine has only 8 GPUs, but the parameters for dp (data parallelism) and tp (tensor parallelism) are both set to 8. From my understanding, with tp=8, the machine with only 8 GPUs can have a maximum dp of 1. I understand the corresponding principle of ep (expert parallelism, e.g., ep=dp), but I cannot understand the related values of the deployment command in this PR. Can someone explain this to me? Thanks!

Image

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions