-
Notifications
You must be signed in to change notification settings - Fork 5.8k
[Distribution] Support DualPipeV #71427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
dd1e0c0
to
eb7bd55
Compare
3061629
to
40c4b45
Compare
40c4b45
to
52681ba
Compare
b320933
to
b179e5e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* [Distribution] Support DualPipeV
* [Distribution] Support DualPipeV (#71427) * [Distribution] Support DualPipeV * [Distributed] Add fail-fast for dualpipev (#71977) * [Distribution] support ScheduleNode for overlapping in dualpipev (#71665) * [Distribution] support ScheduleNode for overlapping in dualpipev * fix * opt mem * [Bug fix] fix mem leakage in dualpipev (#72070) * fix code style * fix pipeline in dynamic_shape
* [Distribution] Support DualPipeV * fix * fix
PR Category
Distributed Strategy
PR Types
New features
Description
An implementation of the DeepSeek-V3 DualPipeV, based on https://github.com/deepseek-ai/DualPipe/blob/main/dualpipe/dualpipev.py
For the pipeline schedule
Usage:
set
use_dualpipev=True
for both yourPipelineLayer
and thestrategy.hybrid_configs
The following codes can be run using
python -m paddle.distributed.launch --gpus="0,1,2,3" demo.py
For the SplitBW Linear
SplitBW Linear is used for zero bubble pipeline proposed in https://arxiv.org/abs/2401.10241
Use

paddle.distributed.fleet.meta_parallel.zero_bubble_utils.SplitBWLinear
to replace the standardnn.Linear
. Notably,SplitBWLinear
can only be used inDualPipeV
; otherwise, users need to manage theWeightGradStore
themselves to ensure that all weight gradients are calculated.Pcard-76459