【PIR】PIR下的分布式算子注册

## 一、需求背景
飞桨正在构建一套新的IR体系.在新IR下飞桨基于动态图的更规范的算子定义(ops.yaml、legacy_ops.yaml)生成了新IR体系下的算子.在新的IR体系下仍然需要保证旧IR的兼容性.为此飞桨提供了`ProgramTranslator`(相关代码位于`paddle/fluid/ir_adaptor/translator/`),它可以将旧IR表示下的计算图翻译为新IR下的计算图.目前，`ProgramTranslator`的核心工作是完成单个`OP`的翻译.也就是将旧IR下定义的`OP`(一般定义在`paddle/fluid/operators`文件夹下)翻译为新IR下定义的算子.

现在有一部分分布式算子在新IR下是没有定义的.我们需要在新IR下为它们补充定义并保证`ProgramTranslator`可以成功完成翻译.

**需要注册的分布式算子如下:**
| 序号 | 单测 | 认领人/状态/PR号 |
| :-: | :-: | :-: |
| 1 | push_sparse_v2 | @enkilee <img src="https://img.shields.io/badge/状态-完成任务-9B59B6" /> [#60473](https://github.com/PaddlePaddle/Paddle/pull/60473) | 
| 2 | distributed_push_sparse | @enkilee <img src="https://img.shields.io/badge/状态-完成任务-9B59B6" /> [#60805](https://github.com/PaddlePaddle/Paddle/pull/60805) | 
| 3 | c_allreduce_min | @enkilee <img src="https://img.shields.io/badge/状态-完成任务-9B59B6" /> [#60584](https://github.com/PaddlePaddle/Paddle/pull/60584) | 
| 4 | global_scatter | @xiaoyewww <img src="https://img.shields.io/badge/状态-完成任务-9B59B6" /> [#62579](https://github.com/PaddlePaddle/Paddle/pull/62579) | 
| 5 | partial_allgather | @xiaoyewww <img src="https://img.shields.io/badge/状态-完成任务-9B59B6" /> [#62735](https://github.com/PaddlePaddle/Paddle/pull/62735) | 
| 6 | c_scatter | @DrRyanHuang <img src="https://img.shields.io/badge/状态-报名-2ECC71" /> @enkilee <img src="https://img.shields.io/badge/状态-完成任务-9B59B6" /> [#62369](https://github.com/PaddlePaddle/Paddle/pull/62369) | 
| 7 | c_reduce_prod | @DrRyanHuang <img src="https://img.shields.io/badge/状态-报名-2ECC71" /> @enkilee <img src="https://img.shields.io/badge/状态-完成任务-9B59B6" /> [#62270](https://github.com/PaddlePaddle/Paddle/pull/62270) | 
| 8 | dgc | @xiaoyewww <img src="https://img.shields.io/badge/状态-完成任务-9B59B6" /> [#62781](https://github.com/PaddlePaddle/Paddle/pull/62781) | 
| 9 | partial_recv | @enkilee <img src="https://img.shields.io/badge/状态-完成任务-9B59B6" /> [#62412](https://github.com/PaddlePaddle/Paddle/pull/62412) | 
| 10 | pull_gpups_sparse | @xiaoyewww <img src="https://img.shields.io/badge/状态-完成任务-9B59B6" /> [#62935](https://github.com/PaddlePaddle/Paddle/pull/62935) | 
| 11 | dgc_momentum | @xiaoyewww <img src="https://img.shields.io/badge/状态-完成任务-9B59B6" /> [#63013](https://github.com/PaddlePaddle/Paddle/pull/63013) | 
| 12 | all_reduce | @xiaoyewww <img src="https://img.shields.io/badge/状态-完成任务-9B59B6" /> [#62634](https://github.com/PaddlePaddle/Paddle/pull/62634) | 
| 13 | partial_send | @Difers <img src="https://img.shields.io/badge/状态-完成任务-9B59B6" /> [#60484](https://github.com/PaddlePaddle/Paddle/pull/60484) | 
| 14 | send_and_recv | @Difers <img src="https://img.shields.io/badge/状态-提交PR-F39C12" /> [#62589](https://github.com/PaddlePaddle/Paddle/pull/62589) @xiaoyewww <img src="https://img.shields.io/badge/状态-完成任务-9B59B6" /> [#64203](https://github.com/PaddlePaddle/Paddle/pull/64203) | 
| 15 | push_dense | @Difers <img src="https://img.shields.io/badge/状态-报名-2ECC71" /> @enkilee <img src="https://img.shields.io/badge/状态-完成任务-9B59B6" /> [#62505](https://github.com/PaddlePaddle/Paddle/pull/62505) | 
| 16 | c_split | @DrRyanHuang <img src="https://img.shields.io/badge/状态-报名-2ECC71" /> @enkilee <img src="https://img.shields.io/badge/状态-完成任务-9B59B6" /> [#62416](https://github.com/PaddlePaddle/Paddle/pull/62416) | 
| 17 | barrier | @xiaoyewww <img src="https://img.shields.io/badge/状态-完成任务-9B59B6" /> [#62802](https://github.com/PaddlePaddle/Paddle/pull/62802) | 
| 18 | lars_momentum | @enkilee <img src="https://img.shields.io/badge/状态-完成任务-9B59B6" /> [#60838](https://github.com/PaddlePaddle/Paddle/pull/60838) | 
| 19 | pull_box_sparse | @LittleNoob2333 <img src="https://img.shields.io/badge/状态-报名-2ECC71" /> @enkilee <img src="https://img.shields.io/badge/状态-完成任务-9B59B6" /> [#62982](https://github.com/PaddlePaddle/Paddle/pull/62982) | 
| 20 | global_gather | @Eacient <img src="https://img.shields.io/badge/状态-报名-2ECC71" /> @xingmingyyj <img src="https://img.shields.io/badge/状态-完成任务-9B59B6" /> [#63867](https://github.com/PaddlePaddle/Paddle/pull/63867) | 
| 21 | c_allreduce_prod | @enkilee <img src="https://img.shields.io/badge/状态-完成任务-9B59B6" /> [#60790](https://github.com/PaddlePaddle/Paddle/pull/60790) | 
| 22 | pull_sparse_v2 | @xiaoyewww <img src="https://img.shields.io/badge/状态-完成任务-9B59B6" /> [#63014](https://github.com/PaddlePaddle/Paddle/pull/63014) | 
| 23 | c_reduce_max | @enkilee <img src="https://img.shields.io/badge/状态-完成任务-9B59B6" /> #62270
| 24 | distributed_lookup_table | @xiaoyewww <img src="https://img.shields.io/badge/状态-完成任务-9B59B6" /> [#60911](https://github.com/PaddlePaddle/Paddle/pull/60911) | 
| 25 | distributed_fused_lamb_init | @xiaoyewww <img src="https://img.shields.io/badge/状态-完成任务-9B59B6" /> [#62050](https://github.com/PaddlePaddle/Paddle/pull/62050) | 
| 26 | limit_by_capacity | @xiaoyewww <img src="https://img.shields.io/badge/状态-完成任务-9B59B6" /> [#62579](https://github.com/PaddlePaddle/Paddle/pull/62579) | 
| 27 | distributed_fused_lamb | @enkilee <img src="https://img.shields.io/badge/状态-完成任务-9B59B6" /> [#61293](https://github.com/PaddlePaddle/Paddle/pull/61293) | 
| 28 | random_routing | @xiaoyewww <img src="https://img.shields.io/badge/状态-完成任务-9B59B6" /> [#62443](https://github.com/PaddlePaddle/Paddle/pull/62443) [#62781](https://github.com/PaddlePaddle/Paddle/pull/62781) | 
| 29 | prune_gate_by_capacity | @xiaoyewww <img src="https://img.shields.io/badge/状态-完成任务-9B59B6" /> [#62494](https://github.com/PaddlePaddle/Paddle/pull/62494) | 
| 30 | nop | @xiaoyewww <img src="https://img.shields.io/badge/状态-完成任务-9B59B6" /> [#62541](https://github.com/PaddlePaddle/Paddle/pull/62541) | 

### PR提交模板
- PR标题
```C++
【PIR Dist Op Reg No.1】 reg c_reduce_min
```
- PR内容
```C++
### PR types
Others

### PR changes
Others

### Description


注册算子 `c_reduce_min`

```
### 认领方式
请大家以 comment 的形式认领任务，如：
```
【报名】：1、3、12-13
```
多个任务之间需要使用**中文顿号**分隔，报名多个连续任务可用横线表示，如 2-5
PR 提交格式：在 PR 的标题中以 【PIR OpTest Fix No.xxx】 开头，注明任务编号

## 看板信息
| 任务方向 | 任务数量 | 提交作品 / 任务认领 | 提交率 | 完成 | 完成率 |
| :----: | :----: | :----: | :----: | :----: | :----: |
| 快乐开源 | 30 | 29 / 29 | 96.67% | 29 | 96.67% |
#####

## 二、Tutorial
每个任务的主要工作可以分为
- 注册算子
- 编写单测
- 修改test/ir/pir/translator/CMakeLists.txt

三个部分,下面展开介绍：
### 2.1 算子注册
关于算子注册的步骤可以参考 #59382 的`二、Tutorial`.

### 2.2 编写单测
为了验证我们新注册的分布式算子可以被成功的翻译.需要编写一个单测进行验证.

首先,编写的所有单测需要放置在`test/ir/pir/translator`文件夹下,~~并且继承` TestOpTranscriber`.~~ 并且继承`TestOpTranslator` 或 `TestOpWithBackwardTranslator`,对于只需要注册前向算子的单测需要继承`TestOpTranslator`,前向和反向算子同时注册时需要继承`TestOpWithBackwardTranslator`.
```Python

class TestOpTranslator(unittest.TestCase):
 def setUp(self):
 self.place = core.Place()
 self.place.set_place(paddle.CPUPlace())
 self.new_scope = paddle.static.Scope()
 self.main_program = paddle.static.Program()

 def append_op(self):
 raise Exception("Define the op to be tested here!")

 def build_model(self):
 with paddle.static.scope_guard(self.new_scope):
 with paddle.static.program_guard(self.main_program):
 self.append_op()

 def check(self):
 self.build_model()
 l = pir.translate_to_pir(self.main_program.desc)
 assert hasattr(self, "op_type"), "Op_type should be specified!"
 assert self.op_type in str(l), (
 self.op_type
 + " should be translated to pd_op."
 + self.op_type
 + '!'
 )
```
~~继承`TestOpTranscribe`时，~~ 继承`TestOpTranslator`时,需要重写`append_op`方法,在组网时将待测试的`Op`加入.`check`的主要思路是将旧IR下表示的计算图使用`ProgramTranslator`翻译为新IR表示的计算图，然后将新IR表示的计算图进行打印，如果计算图中包含待注册的Op,则说明翻译成功.
这里的类名统一采用`TestXXXOpTranslator`的形式,
```Python
class TestCReduceMinOpTranslator(test_op_transcriber.TestOpTranslator):
 def append_op(self):
 self.op_type = "c_reduce_min"
 x = paddle.ones(shape=(100, 2, 3), dtype='float32')
 y = paddle.ones(shape=(100, 2, 3), dtype='float32')
 attrs = {'ring_id': 0, 'root_id': 0, 'use_calc_stream': False}
 helper = LayerHelper(self.op_type)
 helper.append_op(
 type=self.op_type,
 inputs={"X": x},
 outputs={"Out": y},
 attrs=attrs,
 )

 def test_translator(self):
 self.check()


if __name__ == "__main__":
 unittest.main()
```
上述代码是对`c_reduce_min`进行测试的例子.

### 2.3 修改test/ir/pir/translator/CMakeLists.txt
因为现在注册的是分布式算子，如果编译选项`WITH_DISTRIBUTE`不打开的话，这部分算子是不会被编译注册的.所以,即便完成上述操作在某些CI上仍然可能遇到下述问题：
```
ValueError: Operator "xxx" has not been registered.
```
解决方法是修改CMakeLists.
```cmake
file(
 GLOB TEST_INTERP_CASES
 RELATIVE "${CMAKE_CURRENT_SOURCE_DIR}"
 "test_*.py")
string(REPLACE ".py" "" TEST_INTERP_CASES "${TEST_INTERP_CASES}")

set(DISTRIBUTED_OP_TRANSLATOR_TEST test_c_reduce_min_translator)

if(NOT WITH_DISTRIBUTE)
 list(REMOVE_ITEM TEST_INTERP_CASES ${DISTRIBUTED_OP_TRANSLATOR_TEST})
endif()

foreach(target ${TEST_INTERP_CASES})
 py_test_modules(${target} MODULES ${target})
endforeach()
```
可以看出`DISTRIBUTED_OP_TRANSLATOR_TEST`中记录了分布式算子对应的单测，在`WITH_DISTRIBUTE`选项没有打开时，这些单测将会从`TEST_INTERP_CASES`删除，这样在CI上就不会执行该单测了.
以`c_allreduce_min`这个算子为例，单测名称对应为`test_c_allreduce_min_translator`,所以，
```cmake

set(DISTRIBUTED_OP_TRANSLATOR_TEST test_c_reduce_min_translator
 test_c_allreduce_min_translator)
```
将对应单测名称加入集合就可以了.
## 三、Q&A
#### 1.反向算子定义的位置？
**A**:取决于前向算子定义的位置.如果前向定义在paddle/phi/api/yaml/ops.yaml, 反向就需要定义在 paddle/phi/api/yaml/backward.yaml.如果前向定义在 paddle/fluid/pir/dialect/operator/ir/ops.yaml,就把反向定义在paddle/fluid/pir/dialect/operator/ir/ops_backward.yaml.

## 统计信息 
> 排名不分先后 @enkilee (12) @xiaoyewww (15) @Difers (1) @xingmingyyj (1) 
#####

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

【PIR】PIR下的分布式算子注册 #60436

一、需求背景

PR提交模板

认领方式

看板信息

二、Tutorial

2.1 算子注册

2.2 编写单测

2.3 修改test/ir/pir/translator/CMakeLists.txt

三、Q&A

1.反向算子定义的位置？

统计信息

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

序号	单测	认领人/状态/PR号
1	push_sparse_v2	@enkilee #60473
2	distributed_push_sparse	@enkilee #60805
3	c_allreduce_min	@enkilee #60584
4	global_scatter	@xiaoyewww #62579
5	partial_allgather	@xiaoyewww #62735
6	c_scatter	@DrRyanHuang @enkilee #62369
7	c_reduce_prod	@DrRyanHuang @enkilee #62270
8	dgc	@xiaoyewww #62781
9	partial_recv	@enkilee #62412
10	pull_gpups_sparse	@xiaoyewww #62935
11	dgc_momentum	@xiaoyewww #63013
12	all_reduce	@xiaoyewww #62634
13	partial_send	@Difers #60484
14	send_and_recv	@Difers #62589 @xiaoyewww #64203
15	push_dense	@Difers @enkilee #62505
16	c_split	@DrRyanHuang @enkilee #62416
17	barrier	@xiaoyewww #62802
18	lars_momentum	@enkilee #60838
19	pull_box_sparse	@LittleNoob2333 @enkilee #62982
20	global_gather	@Eacient @xingmingyyj #63867
21	c_allreduce_prod	@enkilee #60790
22	pull_sparse_v2	@xiaoyewww #63014
23	c_reduce_max	@enkilee #62270
24	distributed_lookup_table	@xiaoyewww #60911
25	distributed_fused_lamb_init	@xiaoyewww #62050
26	limit_by_capacity	@xiaoyewww #62579
27	distributed_fused_lamb	@enkilee #61293
28	random_routing	@xiaoyewww #62443 #62781
29	prune_gate_by_capacity	@xiaoyewww #62494
30	nop	@xiaoyewww #62541

【PIR】PIR下的分布式算子注册 #60436

Description

一、需求背景

PR提交模板

认领方式

看板信息

二、Tutorial

2.1 算子注册

2.2 编写单测

2.3 修改test/ir/pir/translator/CMakeLists.txt

三、Q&A

1.反向算子定义的位置？

统计信息

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions