Performance fix for broadcast kernel [Part3] #46071

JamesLim-sy · 2022-09-15T06:35:39Z

PR types

Function optimization

PR changes

OPs

Describe

Feature :
To package the dataloader while most of input tensors need broadcast, and improve the performance of broadcast kernel performance in below conditions :
source : op benchmark case_8

input_1.shape	input_2.shape	Dtype	PaddlePR /us	PaddleDev /us	Perf Diff with Dev	Pytorch /us	Perf Diff with Torch
[32,1,1,128]	[1,12,128,1]	FP16	24.2	35.03	↑ 30.92%	27.5	+12%

source : AlphaFold typical ternary broadcast cases

input_1.shape	input_2.shape	input_3.shape	Dtype	PaddlePR /us	SpeedUp with FP32 (PR)	PaddleDev /us	SpeedUp with FP32 (Dev)	PR perf with Dev
[1, 256, 4, 256, 256]	[1, 256, 1, 1, 256]	[1, 1, 4, 256, 256]	FP32	398.46	1.00	434.82	1.00	+8.36%
--	--	--	BF16	263.36	1.51	411.87	1.06	+36.06%
--	--	--	FP16	242.72	1.64	406.26	1.07	+40.26%

[1, 2048, 3584]	[1, 1, 3584]	[1, 2048, 1]	FP32	49.93	1.00	49.92	1.00	-0.02%
--	--	--	BF16	30.08	1.66	38.62	1.29	+22.10%
--	--	--	FP16	27.56	1.81	36.23	1.38	+23.93%

[1, 256, 256]	[1, 1, 256]	[1, 256, 1]	FP32	5.86	1.00	5.96	1.00	+1.54%
--	--	--	BF16	5.73	1.02	5.94	1.00	+3.53%
--	--	--	FP16	5.67	1.03	5.77	1.03	+1.80%

source: Ternary add kernel performance in fused_gate_attention in AlphaFold

Dtype	PaddlePR /us	SpeedUp with FP32 (PR)	PaddleDev /us	SpeedUp with FP32 (Dev)	PR perf with Dev
FP32	465.69	1.00	558.19	1.00	+16.57%
BF16	307.31	1.52	595.16	0.94	+48.36%

paddle-bot · 2022-09-15T06:35:51Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

… optimize_of_broadcastKernel

JamesLim-sy · 2022-09-17T09:01:35Z

Succussfully built in local Kunlun-KP-Build enviroment.

AnnaTrainingG · 2022-09-19T01:49:20Z

paddle/phi/kernels/funcs/broadcast_function.h

+                                                  num,
+                                                  block_offset,
+                                                  read_lens,
+                                                  func);
  }
 #else


其实最开始KP的设想是尽可能不加这种判断，加了之后和写两份Kernel就没区别了。。。。

AlphaFold优化起来实在是想不出来其他的优化内容了... 优化这种计算内容简单但性能要求很高的Kernel，就跟在沙漠里面养花一样 T_T

first commit

4659566

JamesLim-sy changed the title ~~first commit~~ Performance fix for broadcast kernel [Part4] Sep 15, 2022

JamesLim-sy added 8 commits September 15, 2022 15:49

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

0c598dc

… optimize_of_broadcastKernel

refine code with template argument

241442a

refine code with template argument

0d9607d

fix push conflicts

0a83893

add ternary broadcast test file

e081325

add ternary broadcast test file

4bb0b21

fix code error in test file

bb0ec51

fix accoriding to ci

d0c6ff3

JamesLim-sy changed the title ~~Performance fix for broadcast kernel [Part4]~~ Performance fix for broadcast kernel [Part3] Sep 17, 2022

fix op-benchmark ci error

c2904ce

AnnaTrainingG approved these changes Sep 19, 2022

View reviewed changes

AnnaTrainingG reviewed Sep 19, 2022

View reviewed changes

JamesLim-sy merged commit 46e4fb2 into PaddlePaddle:develop Sep 19, 2022

JamesLim-sy mentioned this pull request Dec 9, 2022

Divide elementwise case from BroadcastKernel and refine transpose autotune #33051

Merged

JamesLim-sy mentioned this pull request Apr 12, 2023

Support different dtypes of inputs for broadcast for dropout optimization #52093

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance fix for broadcast kernel [Part3] #46071

Performance fix for broadcast kernel [Part3] #46071

Uh oh!

JamesLim-sy commented Sep 15, 2022 •

edited

Loading

Uh oh!

paddle-bot bot commented Sep 15, 2022

Uh oh!

JamesLim-sy commented Sep 17, 2022

Uh oh!

AnnaTrainingG Sep 19, 2022

Uh oh!

JamesLim-sy Sep 19, 2022

Uh oh!

Uh oh!

Performance fix for broadcast kernel [Part3] #46071

Performance fix for broadcast kernel [Part3] #46071

Uh oh!

Conversation

JamesLim-sy commented Sep 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR types

PR changes

Describe

Uh oh!

paddle-bot bot commented Sep 15, 2022

Uh oh!

JamesLim-sy commented Sep 17, 2022

Uh oh!

AnnaTrainingG Sep 19, 2022

Choose a reason for hiding this comment

Uh oh!

JamesLim-sy Sep 19, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JamesLim-sy commented Sep 15, 2022 •

edited

Loading