Skip to content

Conversation

AndPuQing
Copy link
Contributor

PR Category

User Experience

PR Types

New features

Description

黑客松 25 题拆分PR

Copy link

paddle-bot bot commented Apr 9, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Apr 9, 2024
@luotao1 luotao1 changed the title 【Hackathon 6th No.25】为 paddle.histogram 进行功能对齐与功能增强 【Hackathon 6th No.25】为 paddle.histogram 进行功能对齐与功能增强 -part Apr 9, 2024
@AndPuQing
Copy link
Contributor Author

关于Weight参数的兼容性,是需要参考这篇文档中, 向op_version添加一个add_input

@zhwesky2010
Copy link
Contributor

@AndPuQing 如果是在原API的最后面添加默认参数,且默认值也不会改变原有行为的,那么就不会有不兼容问题,无需修改其他位置。

因此你在修改的过程中,需要进行兼容性的设计,确保之前的代码运行时不会失败(语法失败、结果改变)。

@AndPuQing
Copy link
Contributor Author

AndPuQing commented Apr 12, 2024

我将Weight放置在后面,现有的单侧会出现错误

image

不知道是否可以修改现有单测?

@@ -1351,8 +1351,9 @@
backward : heaviside_grad

- op : histogram
args : (Tensor input, int64_t bins = 100, int min = 0, int max = 0)
args : (Tensor input, Tensor weight, int64_t bins = 100, int min = 0, int max = 0, bool density = false)
Copy link
Contributor

@zhwesky2010 zhwesky2010 Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个weight并没有放到最后面,你这个是不兼容的修改方式

@@ -2190,21 +2190,26 @@ def bmm(x, y, name=None):
return out


def histogram(input, bins=100, min=0, max=0, name=None):
def histogram(
Copy link
Contributor

@zhwesky2010 zhwesky2010 Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

你这个weight要放到最后面,进行兼容性的修改。

一个接口前面突然插入一个参数,导致所有参数位置发生改变,是一种不兼容的修改。

比如以前某行代码调用时没有指定关键字:paddle.histogram(inputs, 4, 0, 3),前面突然插个参数就会直接挂了,放后面就没有这个问题

@zhwesky2010
Copy link
Contributor

关于Weight参数的兼容性,是需要参考这篇文档中, 向op_version添加一个add_input

是的,需要修改op_version.yaml

Copy link

paddle-ci-bot bot commented Apr 20, 2024

Sorry to inform you that 4e8b5ac's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@luotao1 luotao1 added the API label Apr 24, 2024
else:
helper = LayerHelper('histogram', **locals())
check_variable_and_dtype(
input, 'X', ['int32', 'int64', 'float32', 'float64'], 'histogram'
)
out = helper.create_variable_for_type_inference(VarDesc.VarType.INT64)

if density or weight:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是不是 if weight:

Copy link

paddle-ci-bot bot commented Jun 12, 2024

Sorry to inform you that ef946d9's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@@ -2205,10 +2207,13 @@ def histogram(input, bins=100, min=0, max=0, name=None):
bins (int, optional): number of histogram bins. Default: 100.
min (int, optional): lower end of the range (inclusive). Default: 0.
max (int, optional): upper end of the range (inclusive). Default: 0.
weight (Tensor, optional): If provided, it must have the same shape as input.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里除了需要写weight的形状外,还需要写weight的功能

float gap = static_cast<float>(nbins) /
static_cast<float>((output_max - output_min)) / *sum_data;
for (int64_t i = 0; i < nbins; i++) {
out_data[i] *= gap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个公式不是直接除以sum_data吗,上面这个gap的写法不太好看出公式,优化下写法

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里我是提前除以sum_data,然后就不用在每个for 循环中多一个除操作

Copy link
Contributor

@zhwesky2010 zhwesky2010 Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里我是提前除以sum_data,然后就不用在每个for 循环中多一个除操作

这里再乘以 static_cast<float>(nbins) / static_cast<float>((output_max - output_min)) 是因为?公式没看明白

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为连续型概率分布密度函数 (probability density function) 需要保证:

$$ \int _{-\infty}^{+\infty} f(x) , dx =1 $$

即密度函数与坐标轴围成面积为 $1$。当然在工程实践上需要离散化上述公式,即确保:

$$ \sum^{n}f(x) \Delta x= 1 $$

这里的 $\Delta x$ 是指 histogram 每个小矩形的宽度。

有别于在离散型随机变量上定义的概率质量函数 (probability mass function):

$$ f(x)=\begin{cases} P {X=x_{k}}=p_{k} & \text { 当 } x=x_{k}(k=1,2,3, \cdots) \text { 时 } \\ 0 & \text { 其他 } \end{cases} $$

二者的区别是,概率质量函数每个点表示该点的概率值,而分布密度函数得在区间做积分才能得出概率。

所以在计算时,在得出每个区间的 $p(x)$ 后需要标准化一下,

$$ f(x) = \frac{p(x)}{\Delta x} = \frac{N_{x}}{N \Delta x} $$

所以代码就有

float* sum_data = sum.data<float>();
float gap = static_cast<float>((output_max - output_min)) / 
            static_cast<float>(nbins);
for (int64_t i = 0; i < nbins; i++) {
      out_data[i] /= (gap * *sum_data)
}

考虑到除法相比于乘法更慢,所以变形了一下。

float* sum_data = sum.data<float>();
float gap = static_cast<float>(nbins) /
          static_cast<float>((output_max - output_min)) / *sum_data;
for (int64_t i = 0; i < nbins; i++) {
      out_data[i] *= gap;
}

Copy link
Contributor

@zhwesky2010 zhwesky2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

麻烦修改一下

float gap = static_cast<float>(nbins) /
static_cast<float>((output_max - output_min)) / *sum_data;
for (int64_t i = 0; i < nbins; i++) {
out_data[i] *= gap;
Copy link
Contributor

@zhwesky2010 zhwesky2010 Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里我是提前除以sum_data,然后就不用在每个for 循环中多一个除操作

这里再乘以 static_cast<float>(nbins) / static_cast<float>((output_max - output_min)) 是因为?公式没看明白

@@ -2072,8 +2072,30 @@ void HashInferMeta(const MetaTensor& x,
out->set_dtype(x.dtype());
}

void HistogramInferMeta(
const MetaTensor& input, int64_t bins, int min, int max, MetaTensor* out) {
void HashInferMeta(const MetaTensor& x,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的修改是不是commit弄错了,将其他commit内容弄过来了

}
}
if (density) {
DenseTensor sum = phi::Sum<float, Context>(
Copy link
Contributor

@zhwesky2010 zhwesky2010 Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对output所有元素求和了之后得到sum,再output逐个除以sum,得到的值不是概率密度吗

这个输出结果和pytorch的结果进行了对比检查没

zhwesky2010
zhwesky2010 previously approved these changes Jul 1, 2024
Copy link
Contributor

@zhwesky2010 zhwesky2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

先合入,后面我们再通过Pytorch代码转换验证一遍结果。

@luotao1 luotao1 changed the title 【Hackathon 6th No.25】为 paddle.histogram 进行功能对齐与功能增强 -part 【Hackathon 6th No.25】为 paddle.histogram 进行功能对齐与功能增强 Jul 1, 2024
@luotao1
Copy link
Contributor

luotao1 commented Jul 2, 2024

  1. https://xly.bce.baidu.com/paddlepaddle/paddle/newipipe/detail/11000308/job/26709448
2024-07-01 16:39:44 ++ exit 5\n++ '[' -d /paddle/build/pr_whl ']'
2024-07-01 16:39:44 ++ pip install /paddle/build/pr_whl/paddlepaddle_gpu-0.0.0-cp310-cp310-linux_x86_64.whl
2024-07-01 16:39:44 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.
2024-07-01 16:39:44 ++ python -c 'import paddle;print(paddle.__version__);paddle.version.show()'
2024-07-01 16:39:44 ++ cd /paddle/tools
2024-07-01 16:39:44 ++ '[' cpu = cpu ']'
2024-07-01 16:39:44 ++ python sampcd_processor.py --mode cpu
2024-07-01 16:39:44 ----------------Check results--------------------
2024-07-01 16:39:44 >>> Sample code test capacity: {'cpu'}
2024-07-01 16:39:44 >>> 1 sample codes ran failed in env: {'cpu'}
2024-07-01 16:39:44 <DocTest(<modname?> paddle.Tensor.histogram_bin_edges:1:0 ln 1)>, running time: 0.899s
2024-07-01 16:39:44 >>> Mistakes found in sample codes in env: {'cpu'}!
2024-07-01 16:39:44 >>> Please recheck the sample codes.
2024-07-01 16:39:44 ----------------End of the Check--------------------
  1. https://xly.bce.baidu.com/paddlepaddle/paddle/newipipe/detail/10999911/job/26708825
It is recommended to use 'np.testing.assert_allclose' and 'np.testing.assert_array_equal' instead of 'self.assertTrue(np.allclose(...))' and 'self.assertTrue(np.array_equal(...))'.
2024-07-01 15:34:22 Please modify the code below. If anything is unclear, please read the specification [ https://github.com/PaddlePaddle/community/blob/master/rfcs/CodeStyle/20220805_code_style_improvement_for_unittest.md#background ]. If it is a mismatch, please request SigureMo (Recommend) or luotao1 or Aurelius84 review and approve.
2024-07-01 15:34:22 The code that do not meet the specification are as follows:
2024-07-01 15:34:22 +        self.assertTrue(np.allclose(self.out, out))
  1. 看下覆盖率没过的原因,如果本地能过,请贴截图

Comment on lines 358 to 365

void HistogramInferMeta(
const MetaTensor& input, int64_t bins, int min, int max, MetaTensor* out);
void HistogramInferMeta(const MetaTensor& input,
const MetaTensor& weight,
int64_t bins,
int min,
int max,
bool density,
MetaTensor* out);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

新增参数之后属于binary类型算子,需要换个文件放了

@SigureMo SigureMo changed the title 【Hackathon 6th No.25】为 paddle.histogram 进行功能对齐与功能增强 【Hackathon 6th No.25】[Typing] 为 paddle.histogram 进行功能对齐与功能增强 Jul 4, 2024
Comment on lines 2356 to 2357
min: float = 0,
max: float = 0,
Copy link
Member

@SigureMo SigureMo Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这两个到底是 float 还是 int?为什么文档和类型提示不一致?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

实现也要移动吧?

Copy link
Member

@SigureMo SigureMo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTMeow 🐾 for new API annotations

@luotao1 luotao1 changed the title 【Hackathon 6th No.25】[Typing] 为 paddle.histogram 进行功能对齐与功能增强 【Hackathon 6th No.25】[Typing] 为 paddle.histogram 进行功能对齐与功能增强 -part Jul 5, 2024
Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请修改对应的中文文档

Copy link
Contributor

@sunzhongkai588 sunzhongkai588 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants