Skip to content

Conversation

ForFishes
Copy link
Member

PR types

New features

PR changes

Others

Description

[Distributed] Support dp/sharding overlap in virtual pp

@ForFishes ForFishes merged commit f275ad2 into PaddlePaddle:incubate/new_frl Jul 26, 2023
@ForFishes ForFishes deleted the add_dp_overlap branch July 26, 2023 03:39
FeixLiu added a commit to FeixLiu/Paddle that referenced this pull request Aug 8, 2023
FeixLiu added a commit that referenced this pull request Aug 9, 2023
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 20, 2023
…ual pp (PaddlePaddle#55651)

* Add virtual pp and dp overlap

* add sharding/dp overlap

* add dp/vpp overlap

* fix code

* fix log
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 22, 2023
…ual pp (PaddlePaddle#55651)

* Add virtual pp and dp overlap

* add sharding/dp overlap

* add dp/vpp overlap

* fix code

* fix log
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 23, 2023
…ual pp (PaddlePaddle#55651)

* Add virtual pp and dp overlap

* add sharding/dp overlap

* add dp/vpp overlap

* fix code

* fix log
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 25, 2023
…ual pp (PaddlePaddle#55651)

* Add virtual pp and dp overlap

* add sharding/dp overlap

* add dp/vpp overlap

* fix code

* fix log
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 28, 2023
…ual pp (PaddlePaddle#55651)

* Add virtual pp and dp overlap

* add sharding/dp overlap

* add dp/vpp overlap

* fix code

* fix log
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 28, 2023
…ual pp (PaddlePaddle#55651)

* Add virtual pp and dp overlap

* add sharding/dp overlap

* add dp/vpp overlap

* fix code

* fix log
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 28, 2023
…ual pp (PaddlePaddle#55651)

* Add virtual pp and dp overlap

* add sharding/dp overlap

* add dp/vpp overlap

* fix code

* fix log
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 28, 2023
…ual pp (PaddlePaddle#55651)

* Add virtual pp and dp overlap

* add sharding/dp overlap

* add dp/vpp overlap

* fix code

* fix log
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 28, 2023
…ual pp (PaddlePaddle#55651)

* Add virtual pp and dp overlap

* add sharding/dp overlap

* add dp/vpp overlap

* fix code

* fix log
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 30, 2023
…ual pp (PaddlePaddle#55651)

* Add virtual pp and dp overlap

* add sharding/dp overlap

* add dp/vpp overlap

* fix code

* fix log
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Dec 5, 2023
…ual pp (PaddlePaddle#55651)

* Add virtual pp and dp overlap

* add sharding/dp overlap

* add dp/vpp overlap

* fix code

* fix log
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Dec 5, 2023
…ual pp (PaddlePaddle#55651)

* Add virtual pp and dp overlap

* add sharding/dp overlap

* add dp/vpp overlap

* fix code

* fix log
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Dec 5, 2023
…ual pp (PaddlePaddle#55651)

* Add virtual pp and dp overlap

* add sharding/dp overlap

* add dp/vpp overlap

* fix code

* fix log
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Dec 5, 2023
…ual pp (PaddlePaddle#55651)

* Add virtual pp and dp overlap

* add sharding/dp overlap

* add dp/vpp overlap

* fix code

* fix log
zhiqiu pushed a commit that referenced this pull request Dec 6, 2023
* part-3 cherry from: add check for cembedding (#55621)

* part-3 fix cherry from: add check for cembedding

* part-3 fix c_embedding

* fix test_gpt_with_pir caused by pir

* part-3 cherry from: [Distributed] Support dp/sharding overlap in  virtual pp (#55651)

* Add virtual pp and dp overlap

* add sharding/dp overlap

* add dp/vpp overlap

* fix code

* fix log

* part-3 cherry from: [cherry-pick] Integration flash attention 2 (#56015)

* [FlashAttn] add flash randomness control (#52902)

* add flash randomness control

* fix VLOG undefied

* [WIP] Integration flash attention 2 (#55758)

* Work for fa-2 padded fwd. Code to be cleaned.

* Work for fa2 unpadded fwd.

* Work for padded-bwd, dk get small diff on np.random.seed(0)

* Anyway I pass paddle's utest, except return softmax without dropout.

* Clean code.

* Modify interface.

* Clean code and add some check.

* Easy compile for dev.

* Fix ci.

* Fix ci-build.

* Add std c++17 option again.

* Limit max job when compiling fa2.

* Remove const_cast

* Add fwd params, to be cleaned.

* Clean code.

* Add bwd params.

* Clean code.

* Add enforce.

* Use v2.0.4

* Pass RNG state to fa2 capi

* Fix review.

* Add assert

* Skip compile for sm less than 80.

---------

Co-authored-by: Chitsing KUI <kuizhiqing@msn.com>

* part-4 cherry from: fix codestyle (#56066)

* part-4 cherry from(no change): Add assert for static and other plateform (#56044)

* part-4 cherry-pick from: dp and sharding coexist (#56096)

* dp and sharding coexist

* dp

* part-4 cherry from: [Distributed] Add debug information for processgroupnccl (#56441)

* add debug information

* fix log

* fix log

* add detach for pp

* part-4 cherry from: [BugFix]Fix bug in paddle.device.cdua.synchronize() (#56451)

* fix bug in synchronize

* fix bug in synchronize

* part-4 cherry from: add fused gradient (#57048)

* part-4 cherry from: [Distribtued] add eager_communication_connection for eager mode in nccl (#57517)

* add eager_nccl_connection

* add eager_connection

* add eager_connection

* part-4 cherry from: Add auto growth allocator for CUDA pinned allocator (#57625)

* fix h2d bandwidth

* remove useless flags

* fix cherrry pick #56066

* part-5 cherry from: Add allocation debug FLAGS (#57797)

* Add allocation debug FLAGS

* add sync after value set

* refine flags

* part-5 cherry from: fix softmax backward (#57971)

* part-5 cherry from: [Distributed]Optimize memory in processgroup (#58299)

* optimize memory in processgroupnccl

* optimize memory in processgroupnccl

* optimize memory in processgroupnccl

* optimize memory in processgroupnccl

* part-5 cherry from: [Distributed]Add unbalance batch for virtual pp (#58383)

* add unbalanced batch for vpp

* add unbalanced batch for vpp

* add unbalanced batch for vpp

* fix

* fix comments

* fix kunlun compatibility issues

* fix test_fused_rotary_position_embedding.py

* fix allocator.h

* tinyfix

* fix conflicts

* fix new ir translator c_embedding failure

---------

Co-authored-by: ShenLiang <1422485404@qq.com>
Co-authored-by: umiswing <umiswing@foxmail.com>
Co-authored-by: Chitsing KUI <kuizhiqing@msn.com>
Co-authored-by: niuliling123 <51102941+niuliling123@users.noreply.github.com>
Co-authored-by: liuzhenhai93 <liuzhenhai93@outlook.com>
Co-authored-by: sneaxiy <32832641+sneaxiy@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants