Skip to content

Conversation

SigureMo
Copy link
Member

@SigureMo SigureMo commented Sep 14, 2022

PR types

Others

PR changes

Others

Describe

#45937 NPU 单测失败,因此暂时 revert 了 NPU 单测的 changes,这个 PR 用于复现和尝试修复该问题

@paddle-bot
Copy link

paddle-bot bot commented Sep 14, 2022

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added contributor External developers status: proposed labels Sep 14, 2022
@luotao1 luotao1 self-assigned this Sep 14, 2022
@SigureMo
Copy link
Member Author

SigureMo commented Sep 14, 2022

基本所有 NPU 单测都报如下错误:

image

貌似是 #45541 引入的两行(if np_dtype == "bfloat16")导致的:

if np_dtype == "bfloat16":
dtype = np.uint16
else:
dtype = np.dtype(np_dtype)

经过调试确实是在 np_dtype == "bfloat16" 时出现了错误,可在 CI 上通过以下方式捕获到(log 见 log-with-catch-error.log

https://xly.bce.baidu.com/paddlepaddle/paddle/newipipe/detail/6621555/job/18450611

try:
    np_dtype == "bfloat16"
except TypeError:
    print("[Catched TypeError] np_dtype: ", np_dtype, ", type: ",
              type(np_dtype))
    # [Catched TypeError] np_dtype:  float32 , type:  <class 'numpy.dtype'>

疑似是 np.dtype__eq__ 方法做了某种转换导致的错误,因此修改成下面可以解决该问题,但由于本地无法复现,不确定问题发生原因和修改方案是否合适

if isinstance(np_dtype, str) and np_dtype == "bfloat16":
    dtype = np.uint16
else:
    dtype = np.dtype(np_dtype)
# same as #46065

虽然该问题解决了,但仍然有些别的问题没有解决(有三个单测报错了,log 见 log-after-fix-eq-error.log),因此暂时把所有修改都恢复了

@luotao1
Copy link
Contributor

luotao1 commented Sep 15, 2022

@qili93 帮忙看下NPU的CI问题~

@luotao1
Copy link
Contributor

luotao1 commented Sep 15, 2022

@qili93 @zhangbo9674 讨论了下:

  • 对bfloat16的修复方式是正确的。因为你是第一个验证修复的,请回滚下完成bfloat16的修复。
  • 剩余的三个单测报错信息完全不同,可能是其他错误,我们正在排查。

This reverts commit b3222ad.
Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luotao1 luotao1 merged commit 5022dd9 into PaddlePaddle:develop Sep 15, 2022
@SigureMo SigureMo deleted the trailing-whitespace-npu-ut branch September 15, 2022 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants