You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In PR #2203, I noticed that the deployment machine has only 8 GPUs, but the parameters for dp (data parallelism) and tp (tensor parallelism) are both set to 8. From my understanding, with tp=8, the machine with only 8 GPUs can have a maximum dp of 1. I understand the corresponding principle of ep (expert parallelism, e.g., ep=dp), but I cannot understand the related values of the deployment command in this PR. Can someone explain this to me? Thanks!