-
Notifications
You must be signed in to change notification settings - Fork 2.3k
[Mcore] context parallel #970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
thanks a ton for quick support! May I know whether you have done some benchmarking or testing of training efficiency upon the context parallel? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ccclyu please try it out and provide feedbacks~
#TODO: support ep | ||
return os.path.join(checkpoint_path, f"optim", f"distrib_optim_pp{pp_rank}_tp{tp_rank}.pt") | ||
return os.path.join(checkpoint_path, f"optim", f"distrib_optim_pp{pp_rank}_tp{tp_rank}_cp{cp_rank}_dp{dp_rank}.pt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ETOgaosion due to the optimizer states are distributed across all gpus, the dp rank also should be saved separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get~ Later will sync to the doc.
support context parallel for mcore backend. Changes on: * configs * model loader * checkpint * single control dispatcher * forward preprocess and postprocess --------- Co-authored-by: gaoziyuan <gaoziyuan.955@bytedance.com>
support context parallel for mcore backend. Changes on: * configs * model loader * checkpint * single control dispatcher * forward preprocess and postprocess --------- Co-authored-by: gaoziyuan <gaoziyuan.955@bytedance.com>
support context parallel for mcore backend. Changes on: * configs * model loader * checkpint * single control dispatcher * forward preprocess and postprocess --------- Co-authored-by: gaoziyuan <gaoziyuan.955@bytedance.com>
support context parallel for mcore backend.
Changes on: