【Hackathon 6th No.34】support return micro batch loss for dygraph train_batch #64218
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR Category
Auto Parallel
PR Types
New features
Description
支持动态图流水并行时返回 micro batch 的 loss
主要思路为将 self.total_loss 的累加策略更改为存储所有的 micro batch 的loss,当开启开关的时候将存储的 loss 合并为一个 tensor 返回,否则按照原来的逻辑将 loss 合并(求平均)。
在打开开关的时候,广播到其他卡的loss也是所有micro batch 的loss,参与计算 backward 的 loss 还是之前的 loss 没有变动这部分的逻辑