【Hackathon 6th No.34】support return micro batch loss for dygraph train_batch #64218

AndSonder · 2024-05-11T07:39:00Z

PR Category

Auto Parallel

PR Types

New features

Description

支持动态图流水并行时返回 micro batch 的 loss

主要思路为将 self.total_loss 的累加策略更改为存储所有的 micro batch 的loss，当开启开关的时候将存储的 loss 合并为一个 tensor 返回，否则按照原来的逻辑将 loss 合并（求平均）。

在打开开关的时候，广播到其他卡的loss也是所有micro batch 的loss，参与计算 backward 的 loss 还是之前的 loss 没有变动这部分的逻辑

paddle-bot · 2024-05-11T07:39:04Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

AndSonder · 2024-05-15T03:20:48Z

@ForFishes ci 没啥问题了，还麻烦研发老师帮忙 review 一下

ForFishes

LGTM

luotao1 · 2024-05-15T06:16:06Z

@AndSonder 看下覆盖率

AndSonder · 2024-05-15T06:21:57Z

@luotao1 PR-CI-Coverage 申请豁免， PipelineParallelWithInterleave 的覆盖率需要把单测加入到 test_parallel_dygraph_pipeline_parallel_with_virtual_stage 里面才行，但是 ci 的 2 卡环境测不了这个单测

…n_batch (PaddlePaddle#64218) * support return micro batch loss * fix codestyle * recover some code

support return micro batch loss

18d1dab

paddle-bot bot added the contributor External developers label May 11, 2024

AndSonder marked this pull request as ready for review May 11, 2024 07:39

luotao1 added the PaddlePaddle Hackathon label May 11, 2024

luotao1 assigned luotao1 and ForFishes May 11, 2024

fix codestyle

b86401b

luotao1 mentioned this pull request May 11, 2024

【Hackathon 6th】开源贡献个人挑战赛 #62905

Closed

AndSonder added 3 commits May 11, 2024 23:27

recover some code

9bcfec3

Merge branch 'PaddlePaddle:develop' into hack_34

a750251

Merge branch 'PaddlePaddle:develop' into hack_34

db885c3

ForFishes approved these changes May 15, 2024

View reviewed changes

Merge branch 'PaddlePaddle:develop' into hack_34

cbd47a2

luotao1 merged commit 84fb07d into PaddlePaddle:develop May 16, 2024

co63oc pushed a commit to co63oc/Paddle that referenced this pull request May 18, 2024

【Hackathon 6th No.34】support return micro batch loss for dygraph trai…

0ca30d6

…n_batch (PaddlePaddle#64218) * support return micro batch loss * fix codestyle * recover some code

AndSonder mentioned this pull request May 29, 2024

【Hackathon 6th No.34】support return micro batch loss for dygraph train_batch - fix #64680

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

【Hackathon 6th No.34】support return micro batch loss for dygraph train_batch #64218

【Hackathon 6th No.34】support return micro batch loss for dygraph train_batch #64218

Uh oh!

AndSonder commented May 11, 2024 •

edited

Loading

Uh oh!

paddle-bot bot commented May 11, 2024

Uh oh!

AndSonder commented May 15, 2024

Uh oh!

ForFishes left a comment

Uh oh!

luotao1 commented May 15, 2024

Uh oh!

AndSonder commented May 15, 2024

Uh oh!

Uh oh!

【Hackathon 6th No.34】support return micro batch loss for dygraph train_batch #64218

【Hackathon 6th No.34】support return micro batch loss for dygraph train_batch #64218

Uh oh!

Conversation

AndSonder commented May 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

PR Types

Description

Uh oh!

paddle-bot bot commented May 11, 2024

Uh oh!

AndSonder commented May 15, 2024

Uh oh!

ForFishes left a comment

Choose a reason for hiding this comment

Uh oh!

luotao1 commented May 15, 2024

Uh oh!

AndSonder commented May 15, 2024

Uh oh!

Uh oh!

AndSonder commented May 11, 2024 •

edited

Loading