Skip to content

Conversation

mzj104
Copy link
Contributor

@mzj104 mzj104 commented Apr 15, 2025

PR Category

Execute Infrastructure

PR Types

New features

Description

isclose Tensor.isclose支持0-Size。

修改历程介绍如下:

在PaddleAPITest report/0size_tensor中检索paddle.isclose的错误日志,发现CUDA error(9)报错。分析可能是前向过程出错。

2025-03-05 15:41:48.693059 test begin: paddle.isclose(Tensor([0, 10],"float64"), Tensor([0, 10],"float64"), rtol=1e-05, atol=1e-08, )

[cuda error] paddle.isclose(Tensor([0, 10],"float64"), Tensor([0, 10],"float64"), rtol=1e-05, atol=1e-08, ) 
 (External) CUDA error(9), invalid configuration argument. 
  [Hint: 'cudaErrorInvalidConfiguration'. This indicates that a kernel launch is requesting resources that can never be satisfied by the current device. Requestingmore shared memory per block than the device supports will trigger this error, as will requesting too many threads or blocks.See cudaDeviceProp for more device limitations.] (at ../paddle/fluid/pybind/eager_functions.cc:1388)

前向修复:
a. 在Paddle代码中检索def isclose,发现isclose的核心实现调用的是_C_ops的isclose
b. 以_C_ops的isclose在paddle/phi/ops/yaml中检索,发现isclose的InferMeta函数使用到一个:
ValueCompareInferMeta
c. 在代码中检索ValueCompareInferMeta,并检查其dims(shape)的推导是否正确(在isclose中推导是正确因此不用修改)
d. 在paddle/phi/kernels中检索isclose,找全所有isclose的实现Kernel。发现共有四个涉及isclose的文件,分别为:

Paddle/paddle/phi/kernels/isclose_kernel.h
Paddle/paddle/phi/kernels/impl/isclose_kernel_impl.h
Paddle/paddle/phi/kernels/cpu/isclose_kernel.cc
Paddle/paddle/phi/kernels/gpu/isclose_kernel.cu

其中cc和cu文件均将前两个.h文件设为头文件,因此只用修改.h文件即可。而isclose_kernel.h和isclose_kernel_impl.h中不需要(不能)重复定义,故只修改了isclose_kernel_impl.h

在paddle/phi/kernels/impl/isclose_kernel_impl.h中加入以下代码,完成修复

if (x.numel() == 0) {
  dev_ctx.template Alloc<T>(out);
  return;
}

添加单测:

在test/legacy_test/test_isclose_op.py中添加0 size tensor输入的单测:

paddle.isclose(x=Tensor([3, 0, 5],"float64"), y=Tensor([3, 0, 5],"float64"), )

class TestIsclosezerosize(TestIscloseOp):
    def set_args(self):
        self.input = np.zeros([3, 0, 5]).astype("float64")
        self.other = np.zeros([3, 0, 5]).astype("float64")
        self.rtol = np.array([1e-05]).astype("float64")
        self.atol = np.array([1e-08]).astype("float64")
        self.equal_nan = False

备注:尽管所有的isclose的配置,accuracy均已通过,但是存在out没有alloc的问题

Copy link

paddle-bot bot commented Apr 15, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Apr 15, 2025
@luotao1 luotao1 added the HappyOpenSource Pro 进阶版快乐开源活动,更具挑战性的任务 label Apr 15, 2025
@mzj104
Copy link
Contributor Author

mzj104 commented Apr 16, 2025

@cangtianhuang

Copy link
Contributor

@cangtianhuang cangtianhuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同学请修改一下😊,此外测试用例可以添加报错中的配置,确保问题被解决

if (out && out->numel() == 0) {
dev_ctx.template Alloc<bool>(out);
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我认为这个判断有问题,参考 isclose api 文档:https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api/paddle/isclose_cn.html#isclose
0-size 是指输入的 x、y 是 0-size tensor,isclose 的输出是 布尔类型的 tensor,out 始终不为空,判断 out->numel() == 0 应该不对

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我感觉这里用x、y、out哪个的numel应该都对吧?他们三个的dims是一样的~

self.atol = np.array([1e-08]).astype("float64")
self.equal_nan = False


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Osize 改为 ZeroSize

@mzj104 mzj104 requested a review from cangtianhuang April 16, 2025 03:53
Co-authored-by: 苍天荒 <1903374751@qq.com>
Copy link
Contributor

@cangtianhuang cangtianhuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cangtianhuang
Copy link
Contributor

@mzj104 改一下pr标题,前面加一个【BIT】

@mzj104 mzj104 changed the title isclose Tensor.isclose support 0-size 【BIT】isclose Tensor.isclose support 0-size Apr 17, 2025
Copy link
Contributor

@wanghuancoder wanghuancoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if (out && out->numel() == 0) {
dev_ctx.template Alloc<bool>(out);
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我感觉这里用x、y、out哪个的numel应该都对吧?他们三个的dims是一样的~

@wanghuancoder wanghuancoder merged commit fea94f3 into PaddlePaddle:develop Apr 21, 2025
37 of 38 checks passed
YqGe585 pushed a commit to YqGe585/Paddle that referenced this pull request May 7, 2025
* fix isclose

* fix

* fix

* fix

* Update test/legacy_test/test_isclose_op.py

Co-authored-by: 苍天荒 <1903374751@qq.com>

---------

Co-authored-by: 苍天荒 <1903374751@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers HappyOpenSource Pro 进阶版快乐开源活动,更具挑战性的任务
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants