Skip to content

Conversation

seiriosPlus
Copy link
Collaborator

No description provided.

@seiriosPlus
Copy link
Collaborator Author

fireshot capture 7 - paddlepaddle - docs_ - http___yq01-gpu-88-149-23-01 epc ba

@shanyi15 shanyi15 added API Guide docs related to API Guide 使用文档 and removed API Guide docs related to API Guide labels Nov 27, 2018

预测所用的模型与参数的保存:
##################
预测引擎提供了存储预测模型 :code:`fluid.io.save_inference_model` 和加载预测模型 :code:`fluid.io.load_inference_model` 两个接口。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

预测引擎提供了 => Fluid提供了...保存模型配置和参数用于预测使用

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fluid提供了预测所需的“保存预测模型”和“加载预测模型”两个接口:存储预测模型 :code:fluid.io.save_inference_model 和加载预测模型 :code:fluid.io.load_inference_model


.. code-block:: python
多机增量(不带分布式大规模稀疏矩阵)训练的一般步骤为:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不带比较口语,改成“不启用”

│ ├── epoch_id
│ └── step_id
└── checkpoint_1 (第二次保存)
1. 在0号trainer在训练的最后调用 :code:`fluid.io.save_persistables` 保存持久性参数到指定的 :code:`path` 下。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在0号 => 0号


多机checkpoint保存
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

增量和checkpoint还是有区别的,checkpoint如何存呢?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

主要是checkpoint的API移动到了 contrib中不能在文档中暴露,checkpoint和增量并无并无本质区别, 可以跟增量一样的思路进行存取

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

其实不太一样,checkpoint应该把一些非persistable的var也存下来,比如adam的beta_1_pow之类的

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的,确实需要~ 这个问题在目前的多机下会复杂一些,我会考虑解决这个问题

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seiriosPlus @typhoonzero 请问这是要在代码中解决么?文档方面要如何做呢?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 目前的checkpoint存下在的参数不够完整,这个我后续在代码中解决。

  2. 文档中的话目前只提及到增量训练, 目前的checkpoint的流程和做法和增量训练一致。

│ ├── epoch_id
│ └── step_id
└── checkpoint_1 (第二次保存)
1. 在0号trainer在训练的最后调用 :code:`fluid.io.save_persistables` 保存持久性参数到指定的 :code:`path` 下。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有些Trainer首字母是大写的,有些是小写的,请统一~

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的~

@shanyi15
Copy link
Contributor

shanyi15 commented Dec 3, 2018

Done

@@ -100,7 +100,7 @@

预测所用的模型与参数的保存:
##################
预测引擎提供了存储预测模型 :code:`fluid.io.save_inference_model` 和加载预测模型 :code:`fluid.io.load_inference_model` 两个接口。
Fluid提供了预测所需的“保存预测模型”和“加载预测模型”两个接口:存储预测模型 :ref:`fluid.io.save_inference_model` 和加载预测模型 :ref:`fluid.io.load_inference_model`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这句不用修改。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是看了这句~#401 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已恢复

@luotao1
Copy link
Collaborator

luotao1 commented Dec 3, 2018

LGTM

@shanyi15 shanyi15 merged commit 546f543 into PaddlePaddle:develop Dec 3, 2018
RichardWooSJTU pushed a commit to RichardWooSJTU/docs that referenced this pull request Apr 8, 2022
* Squashed commit of the following:

commit d738487089e41c22b3b1cd73aa7c1c40320a6ebf
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 14 17:33:38 2020 +0700

    Adding world_size

    Reduce calls to torch.distributed. For use in create_dataloader.

commit e742dd9619d29306c7541821238d3d7cddcdc508
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 14 15:38:48 2020 +0800

    Make SyncBN a choice

commit e90d4004387e6103fecad745f8cbc2edc918e906
Merge: 5bf8beb cd90360
Author: yzchen <Chenyzsjtu@gmail.com>
Date:   Tue Jul 14 15:32:10 2020 +0800

    Merge pull request PaddlePaddle#6 from NanoCode012/patch-5

    Update train.py

commit cd9036017e7f8bd519a8b62adab0f47ea67f4962
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 14 13:39:29 2020 +0700

    Update train.py

    Remove redundant `opt.` prefix.

commit 5bf8bebe8873afb18b762fe1f409aca116fac073
Merge: c9558a9 a1c8406
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 14 14:09:51 2020 +0800

    Merge branch 'master' of https://github.com/ultralytics/yolov5 into feature/DDP_fixed

commit c9558a9b51547febb03d9c1ca42e2ef0fc15bb31
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 14 13:51:34 2020 +0800

    Add device allocation for loss compute

commit 4f08c692fb5e943a89e0ee354ef6c80a50eeb28d
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 9 11:16:27 2020 +0800

    Revert drop_last

commit 1dabe33a5a223b758cc761fc8741c6224205a34b
Merge: a1ce9b1 4b8450b
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 9 11:15:49 2020 +0800

    Merge branch 'feature/DDP_fixed' of https://github.com/MagicFrogSJTU/yolov5 into feature/DDP_fixed

commit a1ce9b1e96b71d7fcb9d3e8143013eb8cebe5e27
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 9 11:15:21 2020 +0800

    fix lr warning

commit 4b8450b46db76e5e58cd95df965d4736077cfb0e
Merge: b9a50ae 02c63ef
Author: yzchen <Chenyzsjtu@gmail.com>
Date:   Wed Jul 8 21:24:24 2020 +0800

    Merge pull request PaddlePaddle#4 from NanoCode012/patch-4

    Add drop_last for multi gpu

commit 02c63ef81cf98b28b10344fe2cce08a03b143941
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Wed Jul 8 10:08:30 2020 +0700

    Add drop_last for multi gpu

commit b9a50aed48ab1536f94d49269977e2accd67748f
Merge: ec2dc6c 121d90b
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 7 19:48:04 2020 +0800

    Merge branch 'master' of https://github.com/ultralytics/yolov5 into feature/DDP_fixed

commit ec2dc6cc56de43ddff939e14c450672d0fbf9b3d
Merge: d0326e3 82a6182
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 7 19:34:31 2020 +0800

    Merge branch 'feature/DDP_fixed' of https://github.com/MagicFrogSJTU/yolov5 into feature/DDP_fixed

commit d0326e398dfeeeac611ccc64198d4fe91b7aa969
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 7 19:31:24 2020 +0800

    Add SyncBN

commit 82a6182b3ad0689a4432b631b438004e5acb3b74
Merge: 96fa40a 050b2a5
Author: yzchen <Chenyzsjtu@gmail.com>
Date:   Tue Jul 7 19:21:01 2020 +0800

    Merge pull request PaddlePaddle#1 from NanoCode012/patch-2

    Convert BatchNorm to SyncBatchNorm

commit 050b2a5a79a89c9405854d439a1f70f892139b1c
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 7 12:38:14 2020 +0700

    Add cleanup for process_group

commit 2aa330139f3cc1237aeb3132245ed7e5d6da1683
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 7 12:07:40 2020 +0700

    Remove apex.parallel. Use torch.nn.parallel

    For future compatibility

commit 77c8e27e603bea9a69e7647587ca8d509dc1990d
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 7 01:54:39 2020 +0700

    Convert BatchNorm to SyncBatchNorm

commit 96fa40a3a925e4ffd815fe329e1b5181ec92adc8
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Mon Jul 6 21:53:56 2020 +0800

    Fix the datset inconsistency problem

commit 16e7c269d062c8d16c4d4ff70cc80fd87935dc95
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Mon Jul 6 11:34:03 2020 +0800

    Add loss multiplication to preserver the single-process performance

commit e83805563065ffd2e38f85abe008fc662cc17909
Merge: 625bb49 3bdea3f
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Fri Jul 3 20:56:30 2020 +0800

    Merge branch 'master' of https://github.com/ultralytics/yolov5 into feature/DDP_fixed

commit 625bb49f4e52d781143fea0af36d14e5be8b040c
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 2 22:45:15 2020 +0800

    DDP established

* Squashed commit of the following:

commit 94147314e559a6bdd13cb9de62490d385c27596f
Merge: 65157e2 37acbdc
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 16 14:00:17 2020 +0800

    Merge branch 'master' of https://github.com/ultralytics/yolov4 into feature/DDP_fixed

commit 37acbdc
Author: Glenn Jocher <glenn.jocher@ultralytics.com>
Date:   Wed Jul 15 20:03:41 2020 -0700

    update test.py --save-txt

commit b8c2da4
Author: Glenn Jocher <glenn.jocher@ultralytics.com>
Date:   Wed Jul 15 20:00:48 2020 -0700

    update test.py --save-txt

commit 65157e2fc97d371bc576e18b424e130eb3026917
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Wed Jul 15 16:44:13 2020 +0800

    Revert the README.md removal

commit 1c802bfa503623661d8617ca3f259835d27c5345
Merge: cd55b44 0f3b8bb
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Wed Jul 15 16:43:38 2020 +0800

    Merge branch 'feature/DDP_fixed' of https://github.com/MagicFrogSJTU/yolov5 into feature/DDP_fixed

commit cd55b445c4dcd8003ff4b0b46b64adf7c16e5ce7
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Wed Jul 15 16:42:33 2020 +0800

    fix the DDP performance deterioration bug.

commit 0f3b8bb1fae5885474ba861bbbd1924fb622ee93
Author: Glenn Jocher <glenn.jocher@ultralytics.com>
Date:   Wed Jul 15 00:28:53 2020 -0700

    Delete README.md

commit f5921ba1e35475f24b062456a890238cb7a3cf94
Merge: 85ab2f3 bd3fdbb
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Wed Jul 15 11:20:17 2020 +0800

    Merge branch 'feature/DDP_fixed' of https://github.com/MagicFrogSJTU/yolov5 into feature/DDP_fixed

commit bd3fdbbf1b08ef87931eef49fa8340621caa7e87
Author: Glenn Jocher <glenn.jocher@ultralytics.com>
Date:   Tue Jul 14 18:38:20 2020 -0700

    Update README.md

commit c1a97a7767ccb2aa9afc7a5e72fd159e7c62ec02
Merge: 2bf86b8 f796708
Author: Glenn Jocher <glenn.jocher@ultralytics.com>
Date:   Tue Jul 14 18:36:53 2020 -0700

    Merge branch 'master' into feature/DDP_fixed

commit 2bf86b892fa2fd712f6530903a0d9b8533d7447a
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 14 22:18:15 2020 +0700

    Fixed world_size not found when called from test

commit 85ab2f38cdda28b61ad15a3a5a14c3aafb620dc8
Merge: 5a19011 c8357ad
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 14 22:19:58 2020 +0800

    Merge branch 'feature/DDP_fixed' of https://github.com/MagicFrogSJTU/yolov5 into feature/DDP_fixed

commit 5a19011949398d06e744d8d5521ab4e6dfa06ab7
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 14 22:19:15 2020 +0800

    Add assertion for <=2 gpus DDP

commit c8357ad5b15a0e6aeef4d7fe67ca9637f7322a4d
Merge: e742dd9 787582f
Author: yzchen <Chenyzsjtu@gmail.com>
Date:   Tue Jul 14 22:10:02 2020 +0800

    Merge pull request PaddlePaddle#8 from MagicFrogSJTU/NanoCode012-patch-1

    Modify number of dataloaders' workers

commit 787582f97251834f955ef05a77072b8c673a8397
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 14 20:38:58 2020 +0700

    Fixed issue with single gpu not having world_size

commit 63648925288d63a21174a4dd28f92dbfebfeb75a
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 14 19:16:15 2020 +0700

    Add assert message for clarification

    Clarify why assertion was thrown to users

commit 69364d6050e048d0d8834e0f30ce84da3f6a13f3
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 14 17:36:48 2020 +0700

    Changed number of workers check

commit d738487089e41c22b3b1cd73aa7c1c40320a6ebf
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 14 17:33:38 2020 +0700

    Adding world_size

    Reduce calls to torch.distributed. For use in create_dataloader.

commit e742dd9619d29306c7541821238d3d7cddcdc508
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 14 15:38:48 2020 +0800

    Make SyncBN a choice

commit e90d4004387e6103fecad745f8cbc2edc918e906
Merge: 5bf8beb cd90360
Author: yzchen <Chenyzsjtu@gmail.com>
Date:   Tue Jul 14 15:32:10 2020 +0800

    Merge pull request PaddlePaddle#6 from NanoCode012/patch-5

    Update train.py

commit cd9036017e7f8bd519a8b62adab0f47ea67f4962
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 14 13:39:29 2020 +0700

    Update train.py

    Remove redundant `opt.` prefix.

commit 5bf8bebe8873afb18b762fe1f409aca116fac073
Merge: c9558a9 a1c8406
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 14 14:09:51 2020 +0800

    Merge branch 'master' of https://github.com/ultralytics/yolov5 into feature/DDP_fixed

commit c9558a9b51547febb03d9c1ca42e2ef0fc15bb31
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 14 13:51:34 2020 +0800

    Add device allocation for loss compute

commit 4f08c692fb5e943a89e0ee354ef6c80a50eeb28d
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 9 11:16:27 2020 +0800

    Revert drop_last

commit 1dabe33a5a223b758cc761fc8741c6224205a34b
Merge: a1ce9b1 4b8450b
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 9 11:15:49 2020 +0800

    Merge branch 'feature/DDP_fixed' of https://github.com/MagicFrogSJTU/yolov5 into feature/DDP_fixed

commit a1ce9b1e96b71d7fcb9d3e8143013eb8cebe5e27
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 9 11:15:21 2020 +0800

    fix lr warning

commit 4b8450b46db76e5e58cd95df965d4736077cfb0e
Merge: b9a50ae 02c63ef
Author: yzchen <Chenyzsjtu@gmail.com>
Date:   Wed Jul 8 21:24:24 2020 +0800

    Merge pull request PaddlePaddle#4 from NanoCode012/patch-4

    Add drop_last for multi gpu

commit 02c63ef81cf98b28b10344fe2cce08a03b143941
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Wed Jul 8 10:08:30 2020 +0700

    Add drop_last for multi gpu

commit b9a50aed48ab1536f94d49269977e2accd67748f
Merge: ec2dc6c 121d90b
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 7 19:48:04 2020 +0800

    Merge branch 'master' of https://github.com/ultralytics/yolov5 into feature/DDP_fixed

commit ec2dc6cc56de43ddff939e14c450672d0fbf9b3d
Merge: d0326e3 82a6182
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 7 19:34:31 2020 +0800

    Merge branch 'feature/DDP_fixed' of https://github.com/MagicFrogSJTU/yolov5 into feature/DDP_fixed

commit d0326e398dfeeeac611ccc64198d4fe91b7aa969
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 7 19:31:24 2020 +0800

    Add SyncBN

commit 82a6182b3ad0689a4432b631b438004e5acb3b74
Merge: 96fa40a 050b2a5
Author: yzchen <Chenyzsjtu@gmail.com>
Date:   Tue Jul 7 19:21:01 2020 +0800

    Merge pull request PaddlePaddle#1 from NanoCode012/patch-2

    Convert BatchNorm to SyncBatchNorm

commit 050b2a5a79a89c9405854d439a1f70f892139b1c
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 7 12:38:14 2020 +0700

    Add cleanup for process_group

commit 2aa330139f3cc1237aeb3132245ed7e5d6da1683
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 7 12:07:40 2020 +0700

    Remove apex.parallel. Use torch.nn.parallel

    For future compatibility

commit 77c8e27e603bea9a69e7647587ca8d509dc1990d
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 7 01:54:39 2020 +0700

    Convert BatchNorm to SyncBatchNorm

commit 96fa40a3a925e4ffd815fe329e1b5181ec92adc8
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Mon Jul 6 21:53:56 2020 +0800

    Fix the datset inconsistency problem

commit 16e7c269d062c8d16c4d4ff70cc80fd87935dc95
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Mon Jul 6 11:34:03 2020 +0800

    Add loss multiplication to preserver the single-process performance

commit e83805563065ffd2e38f85abe008fc662cc17909
Merge: 625bb49 3bdea3f
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Fri Jul 3 20:56:30 2020 +0800

    Merge branch 'master' of https://github.com/ultralytics/yolov5 into feature/DDP_fixed

commit 625bb49f4e52d781143fea0af36d14e5be8b040c
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 2 22:45:15 2020 +0800

    DDP established

* Fixed destroy_process_group in DP mode

* Update torch_utils.py

* Update utils.py

Revert build_targets() to current master.

* Update datasets.py

* Fixed world_size attribute not found

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants