-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Add multi-device execution support in ONNX #6641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The newly added proto classes should be imported in Line 84 in 5973bd9
__all__
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #6641 +/- ##
=======================================
Coverage 56.45% 56.45%
=======================================
Files 509 509
Lines 32515 32515
Branches 3057 3057
=======================================
Hits 18356 18356
Misses 13334 13334
Partials 825 825 ☔ View full report in Codecov by Sentry. |
b62e289
to
759081a
Compare
@justinchuby @gramalingam do you have any insight on the failing checks? Perhaps I need to also add these proto definitions in a few other files? |
The proto files needs to be auto generated and updated with python onnx/gen_proto.py -l |
The file entry docs/proposals/images/composing_broadcast_axes.png needs to be added to https://github.com/onnx/onnx/blob/main/REUSE.toml |
ad2f94e
to
9d1450a
Compare
I see there is a failed check. Maybe run the proto-generation scripts again (after recent changes)? |
9d1450a
to
77c457f
Compare
Signed-off-by: Kevin Chen <kevinch@nvidia.com>
Signed-off-by: Kevin Chen <kevinch@nvidia.com>
Signed-off-by: Kevin Chen <kevinch@nvidia.com>
Signed-off-by: Kevin Chen <kevinch@nvidia.com>
Signed-off-by: Kevin Chen <kevinch@nvidia.com>
Signed-off-by: Kevin Chen <kevinch@nvidia.com>
Signed-off-by: Kevin Chen <kevinch@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
proto changes lgtm. Thanks!
2469f58
to
fb114fb
Compare
Signed-off-by: Kevin Chen <kevinch@nvidia.com>
Signed-off-by: Kevin Chen <kevinch@nvidia.com>
Hi @kevinch-nv : I think the recent changes you made to |
Sorry, one more suggested change first: can you add a comment line here to document this update to the proto? |
Signed-off-by: Kevin Chen <kevinch@nvidia.com>
Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>
@kevinch-nv @justinchuby can you share a sample |
Co-authored with @gramalingam
Description
Updates ONNX specification to include fields to describe multi-device inference execution.
Motivation and Context
The recent trend in increasingly larger models has spurred an interest in distributed inference. A key performance bottleneck for inference for these large models has been the memory limits of GPUs and other accelerators as well as communication bandwidth. Thus, efficient distributed inference typically requires parallelization of the computation across multiple devices taking memory and bandwidth into account.
Our goal is to extend ONNX so that it can serve as a representation of a parallelized model. This is driven by the current state-of-the-art techniques used for distributed inference (eg., see GSPMD: General and Scalable Parallelization for ML Computation Graphs). In particular, two techniques of interest are: tensor parallelism and pipelining. In tensor parallelism (also known as horizontal parallelism or operator parallelism), the computation of a single operator (node) in the graph is parallelized across multiple devices by sharding its inputs, In pipeline parallelism, different subgraphs are assigned to different devices.