Fix custom op multi-gpu scaling #9283

piiswrong · 2018-01-02T21:48:20Z

Description

(Brief description on what this PR is about)

Checklist

Essentials

Passed code style checking (make lint)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

eric-haibin-lin · 2018-01-03T21:20:59Z

src/operator/custom/custom-inl.h

+
+        std::vector<Engine::VarHandle> vars;
+        for (const auto& i : arrs) vars.push_back(i.var());
+        Engine::Get()->PushSync([=](RunContext rctx) {


Will worker thread and main thread push operations to the engine at the same time?

That's fine. Engine is thread safe

marcoabreu · 2018-01-04T10:09:26Z

Just FYI, every GPU-instance has two GPUs present, so feel free to write a test for this case.

anirudh2290 · 2018-01-05T02:40:05Z

src/operator/custom/custom-inl.h

+            cv_.wait(lock, [&] {return !q_.empty() || destructing_;});
+            while (!q_.empty()) {
+              auto fn = q_.front();
+              lock.unlock();


What happens when after we unlock, something is pushed to the queue ? Will we pop the correct item from the queue ?

there is on 1 consumer thread, so its ok

* refactor custom op * fix * fix * fix * fix

piiswrong added 4 commits December 28, 2017 01:19

refactor custom op

a237660

fix

b2bff93

fix

bd65fd8

fix

a951d0d

chinakook mentioned this pull request Jan 3, 2018

The custom op multi-gpu capablity will be solved! msracver/Deformable-ConvNets#137

Closed

fix

fc1bee5

piiswrong mentioned this pull request Jan 3, 2018

Fix custom op - infer_storage_type_backward #8738

Closed

7 tasks

eric-haibin-lin reviewed Jan 3, 2018

View reviewed changes

szha added the Bug label Jan 4, 2018

anirudh2290 reviewed Jan 5, 2018

View reviewed changes

piiswrong merged commit 004dead into apache:master Jan 5, 2018

szha pushed a commit that referenced this pull request Jan 5, 2018

Fix custom op multi-gpu scaling (#9283)

fe80b1c

* refactor custom op * fix * fix * fix * fix

yuxiangw pushed a commit to yuxiangw/incubator-mxnet that referenced this pull request Jan 25, 2018

Fix custom op multi-gpu scaling (apache#9283)

bd8bb60

* refactor custom op * fix * fix * fix * fix

rahul003 pushed a commit to rahul003/mxnet that referenced this pull request Jun 4, 2018

Fix custom op multi-gpu scaling (apache#9283)

4d7c8dd

* refactor custom op * fix * fix * fix * fix

zheng-da pushed a commit to zheng-da/incubator-mxnet that referenced this pull request Jun 28, 2018

Fix custom op multi-gpu scaling (apache#9283)

1d82ea4

* refactor custom op * fix * fix * fix * fix

evorta mentioned this pull request Jun 16, 2019

## msracver/Deformable-ConvNets#261

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix custom op multi-gpu scaling #9283

Fix custom op multi-gpu scaling #9283

Uh oh!

piiswrong commented Jan 2, 2018

Uh oh!

eric-haibin-lin Jan 3, 2018

Uh oh!

piiswrong Jan 3, 2018

Uh oh!

marcoabreu commented Jan 4, 2018

Uh oh!

anirudh2290 Jan 5, 2018

Uh oh!

piiswrong Jan 5, 2018

Uh oh!

Uh oh!

Fix custom op multi-gpu scaling #9283

Fix custom op multi-gpu scaling #9283

Uh oh!

Conversation

piiswrong commented Jan 2, 2018

Description

Checklist

Essentials

Changes

Comments

Uh oh!

eric-haibin-lin Jan 3, 2018

Choose a reason for hiding this comment

Uh oh!

piiswrong Jan 3, 2018

Choose a reason for hiding this comment

Uh oh!

marcoabreu commented Jan 4, 2018

Uh oh!

anirudh2290 Jan 5, 2018

Choose a reason for hiding this comment

Uh oh!

piiswrong Jan 5, 2018

Choose a reason for hiding this comment

Uh oh!

Uh oh!