Enable float8e8m0 for Q/DQ, and other ops #7120

yuanyao-nv · 2025-07-11T17:45:59Z

Description

The is a followup PR to #7030 which added the float8e8m0 dtype and updated the Cast op.

This PR enables float8e8m0 for the following ops in opset 24:

QuantizeLinear, DequantizeLinear, CastLike
Constant, ConstantOfShape, Identity, Reshape, Shape, Size, If, Loop, Scan, Flatten, Pad, Squeeze, Unsqueeze, Transpose

Motivation and Context

Signed-off-by: Yuan Yao <yuanyao@nvidia.com>

onnx/defs/controlflow/old.cc

+ONNX_OPERATOR_SET_SCHEMA(
+    If,


onnx/defs/controlflow/old.cc

+ONNX_OPERATOR_SET_SCHEMA(
+    Loop,


onnx/defs/controlflow/old.cc

+ONNX_OPERATOR_SET_SCHEMA(
+    Scan,


onnx/defs/generator/old.cc

+ONNX_OPERATOR_SET_SCHEMA(
+    Constant,


onnx/defs/generator/old.cc

+ONNX_OPERATOR_SET_SCHEMA(
+    ConstantOfShape,


onnx/defs/tensor/old.cc

+ONNX_OPERATOR_SET_SCHEMA(
+    Transpose,


onnx/defs/tensor/old.cc

+ONNX_OPERATOR_SET_SCHEMA(
+    Squeeze,


onnx/defs/tensor/old.cc

+ONNX_OPERATOR_SET_SCHEMA(
+    Unsqueeze,


onnx/defs/tensor/old.cc

+ONNX_OPERATOR_SET_SCHEMA(
+    Identity,


onnx/defs/tensor/old.cc

+ONNX_OPERATOR_SET_SCHEMA(
+    Pad,


Copilot

Pull Request Overview

This PR enables the float8e8m0 data type for various tensor and quantization operators in opset 24, including casting, quantization/dequantization, and control‐flow and shape‐manipulation ops.

Registers new version‐conversion adapters and type restrictions for float8e8m0 across opset 23→24 and 24→23
Updates operator schemas (CastLike, QuantizeLinear, DequantizeLinear, etc.) to version 24, adding a round_mode attribute and expanding type constraints to IRv12
Refreshes documentation to include opset 24 versions and float8e8m0 support

Reviewed Changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
version_converter/convert.h	Registers `CompatibleAdapter` and `TypeRestriction` for new ops and float8e8m0
defs/tensor/old.cc	Adds version 23 schemas for CastLike, Reshape, Shape, Size, Transpose, Squeeze, Unsqueeze
defs/tensor/defs.cc	Updates those ops to version 24, expands type constraints to IRv12, adds `round_mode`
defs/quantization/old.cc	Introduces float8e8m0 in QuantizeLinear ver23 schema
defs/quantization/defs.cc	Upgrades QuantizeLinear & DequantizeLinear to ver24 and expands type constraints
defs/operator_sets.h	Adds forward declarations and schema iteration for new ops in version 24
defs/nn/old.cc	Adds Flatten ver23 schema
defs/nn/defs.cc	Upgrades Flatten to ver24, expands type constraints
defs/generator/old.cc	Adds Constant and ConstantOfShape ver23 schema
defs/generator/defs.cc	Upgrades Constant/ConstantOfShape to ver24, expands types
defs/controlflow/old.cc	Adds If, Loop, Scan ver23 schemas
defs/controlflow/defs.cc	Upgrades If, Loop, Scan to ver24, updates `control_flow_types_ir12`
docs/Operators.md	Updates operator index and version links to include opset 24

Comments suppressed due to low confidence (3)

onnx/defs/quantization/defs.cc:103

The new float8e8m0 type is added here; consider adding unit tests for QuantizeLinear and DequantizeLinear with float8e8m0 to verify correct behavior and inference.

            {"tensor(float)", "tensor(float16)", "tensor(bfloat16)", "tensor(int32)", "tensor(float8e8m0)"},

onnx/version_converter/convert.h:833

[nitpick] Consider refactoring the repeated registerAdapter calls into a loop over operator names to reduce duplication and simplify future maintenance.

    registerAdapter(std::make_unique<CompatibleAdapter>("CastLike", OpSetID(23), OpSetID(24)));

onnx/defs/tensor/old.cc:2531

[nitpick] This call duplicates the earlier propagateElemTypeFromInputToOutput invocation at the top of the function. Removing the duplicate could simplify the inference logic.

          propagateElemTypeFromInputToOutput(ctx, 0, 0);

justinchuby · 2025-07-11T18:30:54Z

Lgtm. Can you update tests for castlike? Would be good to have a second review

Signed-off-by: Yuan Yao <yuanyao@nvidia.com>

yuanyao-nv · 2025-07-11T21:27:40Z

@justinchuby seems like there's a lint error caught in the pipeline. It's unclear to me what the error message is referring to and it's also not reproducible locally. Any thoughts?

justinchuby · 2025-07-11T21:32:32Z

I don't see a lint error, just 404. Maybe retry?

justinchuby · 2025-07-11T21:34:42Z

One thing I forgot:

onnx/onnx/onnx.proto

Lines 696 to 697 in eef5187

    
           // INT32, INT16, INT8, INT4, UINT16, UINT8, UINT4, BOOL, FLOAT16, BFLOAT16, FLOAT8E4M3FN, FLOAT8E4M3FNUZ, FLOAT8E5M2, FLOAT8E5M2FNUZ, FLOAT4E2M1 
        
           repeated int32 int32_data = 5 [packed = true];

this line should also be updated? Could you also check that to_array, from_array and make_tensor handle it when data is in int32?

Signed-off-by: Yuan Yao <yuanyao@nvidia.com>

yuanyao-nv · 2025-07-12T06:28:16Z

One thing I forgot:

onnx/onnx/onnx.proto

Lines 696 to 697 in eef5187

// INT32, INT16, INT8, INT4, UINT16, UINT8, UINT4, BOOL, FLOAT16, BFLOAT16, FLOAT8E4M3FN, FLOAT8E4M3FNUZ, FLOAT8E5M2, FLOAT8E5M2FNUZ, FLOAT4E2M1

repeated int32 int32_data = 5 [packed = true];

this line should also be updated? Could you also check that to_array, from_array and make_tensor handle it when data is in int32?

Thanks. Updated the comment.
The helper functions have been tested in the previous PR.

Enable float8e8m0 for Q/DQ, and other ops

efc84e6

Signed-off-by: Yuan Yao <yuanyao@nvidia.com>

yuanyao-nv requested review from a team as code owners July 11, 2025 17:46

github-project-automation bot added this to PR Tracker Jul 11, 2025

github-project-automation bot moved this to In progress in PR Tracker Jul 11, 2025

fix lint

229516a

Signed-off-by: Yuan Yao <yuanyao@nvidia.com>

github-advanced-security bot found potential problems Jul 11, 2025

View reviewed changes

justinchuby requested a review from Copilot July 11, 2025 18:28

Copilot AI reviewed Jul 11, 2025

View reviewed changes

justinchuby approved these changes Jul 11, 2025

View reviewed changes

github-project-automation bot moved this from In progress to Reviewer approved in PR Tracker Jul 11, 2025

upload generated test files

085ef91

Signed-off-by: Yuan Yao <yuanyao@nvidia.com>

update proto file comment

d92222f

Signed-off-by: Yuan Yao <yuanyao@nvidia.com>

yuanyao-nv enabled auto-merge (squash) July 12, 2025 06:28

yuanyao-nv closed this Jul 12, 2025

auto-merge was automatically disabled July 12, 2025 06:29
Pull request was closed

github-project-automation bot moved this from Reviewer approved to Done in PR Tracker Jul 12, 2025

yuanyao-nv reopened this Jul 12, 2025

github-project-automation bot moved this from Done to In progress in PR Tracker Jul 12, 2025

yuanyao-nv enabled auto-merge (squash) July 12, 2025 06:30

yuanyao-nv merged commit 36afe4f into onnx:main Jul 12, 2025
62 of 71 checks passed

github-project-automation bot moved this from In progress to Done in PR Tracker Jul 12, 2025

justinchuby added the topic: operator Issues related to ONNX operators label Jul 24, 2025

osalbahr mentioned this pull request Aug 27, 2025

onnx 1.19.0 Homebrew/homebrew-core#235202

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable float8e8m0 for Q/DQ, and other ops #7120

Enable float8e8m0 for Q/DQ, and other ops #7120

Uh oh!

yuanyao-nv commented Jul 11, 2025

Uh oh!

Check notice

Check notice

Check notice

Check notice

Check notice

Check notice

Check notice

Check notice

Check notice

Check notice

Copilot AI left a comment

Uh oh!

justinchuby commented Jul 11, 2025

Uh oh!

yuanyao-nv commented Jul 11, 2025

Uh oh!

justinchuby commented Jul 11, 2025

Uh oh!

justinchuby commented Jul 11, 2025 •

edited

Loading

Uh oh!

yuanyao-nv commented Jul 12, 2025

Uh oh!

Uh oh!

Uh oh!

		ONNX_OPERATOR_SET_SCHEMA(
		If,

		ONNX_OPERATOR_SET_SCHEMA(
		Loop,

		ONNX_OPERATOR_SET_SCHEMA(
		Scan,

		ONNX_OPERATOR_SET_SCHEMA(
		Constant,

Enable float8e8m0 for Q/DQ, and other ops #7120

Enable float8e8m0 for Q/DQ, and other ops #7120

Uh oh!

Conversation

yuanyao-nv commented Jul 11, 2025

Description

Motivation and Context

Uh oh!

Check notice

Check notice

Check notice

Check notice

Check notice

Check notice

Check notice

Check notice

Check notice

Check notice

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

justinchuby commented Jul 11, 2025

Uh oh!

yuanyao-nv commented Jul 11, 2025

Uh oh!

justinchuby commented Jul 11, 2025

Uh oh!

justinchuby commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuanyao-nv commented Jul 12, 2025

Uh oh!

Uh oh!

Uh oh!

justinchuby commented Jul 11, 2025 •

edited

Loading