-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Enable float8e8m0 for Q/DQ, and other ops #7120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Yuan Yao <yuanyao@nvidia.com>
ONNX_OPERATOR_SET_SCHEMA( | ||
If, |
Check notice
Code scanning / CodeQL
Unused static variable Note
ONNX_OPERATOR_SET_SCHEMA( | ||
Loop, |
Check notice
Code scanning / CodeQL
Unused static variable Note
ONNX_OPERATOR_SET_SCHEMA( | ||
Scan, |
Check notice
Code scanning / CodeQL
Unused static variable Note
ONNX_OPERATOR_SET_SCHEMA( | ||
Constant, |
Check notice
Code scanning / CodeQL
Unused static variable Note
ONNX_OPERATOR_SET_SCHEMA( | ||
ConstantOfShape, |
Check notice
Code scanning / CodeQL
Unused static variable Note
ONNX_OPERATOR_SET_SCHEMA( | ||
Transpose, |
Check notice
Code scanning / CodeQL
Unused static variable Note
ONNX_OPERATOR_SET_SCHEMA( | ||
Squeeze, |
Check notice
Code scanning / CodeQL
Unused static variable Note
ONNX_OPERATOR_SET_SCHEMA( | ||
Unsqueeze, |
Check notice
Code scanning / CodeQL
Unused static variable Note
ONNX_OPERATOR_SET_SCHEMA( | ||
Identity, |
Check notice
Code scanning / CodeQL
Unused static variable Note
ONNX_OPERATOR_SET_SCHEMA( | ||
Pad, |
Check notice
Code scanning / CodeQL
Unused static variable Note
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enables the float8e8m0 data type for various tensor and quantization operators in opset 24, including casting, quantization/dequantization, and control‐flow and shape‐manipulation ops.
- Registers new version‐conversion adapters and type restrictions for float8e8m0 across opset 23→24 and 24→23
- Updates operator schemas (CastLike, QuantizeLinear, DequantizeLinear, etc.) to version 24, adding a
round_mode
attribute and expanding type constraints to IRv12 - Refreshes documentation to include opset 24 versions and float8e8m0 support
Reviewed Changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
version_converter/convert.h | Registers CompatibleAdapter and TypeRestriction for new ops and float8e8m0 |
defs/tensor/old.cc | Adds version 23 schemas for CastLike, Reshape, Shape, Size, Transpose, Squeeze, Unsqueeze |
defs/tensor/defs.cc | Updates those ops to version 24, expands type constraints to IRv12, adds round_mode |
defs/quantization/old.cc | Introduces float8e8m0 in QuantizeLinear ver23 schema |
defs/quantization/defs.cc | Upgrades QuantizeLinear & DequantizeLinear to ver24 and expands type constraints |
defs/operator_sets.h | Adds forward declarations and schema iteration for new ops in version 24 |
defs/nn/old.cc | Adds Flatten ver23 schema |
defs/nn/defs.cc | Upgrades Flatten to ver24, expands type constraints |
defs/generator/old.cc | Adds Constant and ConstantOfShape ver23 schema |
defs/generator/defs.cc | Upgrades Constant/ConstantOfShape to ver24, expands types |
defs/controlflow/old.cc | Adds If, Loop, Scan ver23 schemas |
defs/controlflow/defs.cc | Upgrades If, Loop, Scan to ver24, updates control_flow_types_ir12 |
docs/Operators.md | Updates operator index and version links to include opset 24 |
Comments suppressed due to low confidence (3)
onnx/defs/quantization/defs.cc:103
- The new float8e8m0 type is added here; consider adding unit tests for QuantizeLinear and DequantizeLinear with float8e8m0 to verify correct behavior and inference.
{"tensor(float)", "tensor(float16)", "tensor(bfloat16)", "tensor(int32)", "tensor(float8e8m0)"},
onnx/version_converter/convert.h:833
- [nitpick] Consider refactoring the repeated registerAdapter calls into a loop over operator names to reduce duplication and simplify future maintenance.
registerAdapter(std::make_unique<CompatibleAdapter>("CastLike", OpSetID(23), OpSetID(24)));
onnx/defs/tensor/old.cc:2531
- [nitpick] This call duplicates the earlier propagateElemTypeFromInputToOutput invocation at the top of the function. Removing the duplicate could simplify the inference logic.
propagateElemTypeFromInputToOutput(ctx, 0, 0);
Lgtm. Can you update tests for castlike? Would be good to have a second review |
Signed-off-by: Yuan Yao <yuanyao@nvidia.com>
@justinchuby seems like there's a lint error caught in the pipeline. It's unclear to me what the error message is referring to and it's also not reproducible locally. Any thoughts? |
I don't see a lint error, just 404. Maybe retry? |
One thing I forgot: Lines 696 to 697 in eef5187
|
Signed-off-by: Yuan Yao <yuanyao@nvidia.com>
Thanks. Updated the comment. |
Pull request was closed
Description
The is a followup PR to #7030 which added the float8e8m0 dtype and updated the Cast op.
This PR enables float8e8m0 for the following ops in opset 24:
Motivation and Context