Skip to content

Conversation

yuanyao-nv
Copy link
Contributor

Description

The is a followup PR to #7030 which added the float8e8m0 dtype and updated the Cast op.

This PR enables float8e8m0 for the following ops in opset 24:

  • QuantizeLinear, DequantizeLinear, CastLike
  • Constant, ConstantOfShape, Identity, Reshape, Shape, Size, If, Loop, Scan, Flatten, Pad, Squeeze, Unsqueeze, Transpose

Motivation and Context

Signed-off-by: Yuan Yao <yuanyao@nvidia.com>
@yuanyao-nv yuanyao-nv requested review from a team as code owners July 11, 2025 17:46
@github-project-automation github-project-automation bot moved this to In progress in PR Tracker Jul 11, 2025
Signed-off-by: Yuan Yao <yuanyao@nvidia.com>
Comment on lines +52 to +53
ONNX_OPERATOR_SET_SCHEMA(
If,

Check notice

Code scanning / CodeQL

Unused static variable Note

Static variable dbg_count_check_Onnx_23_verIf is never read.
Comment on lines +378 to +379
ONNX_OPERATOR_SET_SCHEMA(
Loop,

Check notice

Code scanning / CodeQL

Unused static variable Note

Static variable dbg_count_check_Onnx_23_verLoop is never read.
Comment on lines +852 to +853
ONNX_OPERATOR_SET_SCHEMA(
Scan,

Check notice

Code scanning / CodeQL

Unused static variable Note

Static variable dbg_count_check_Onnx_23_verScan is never read.
Comment on lines +404 to +405
ONNX_OPERATOR_SET_SCHEMA(
Constant,

Check notice

Code scanning / CodeQL

Unused static variable Note

Static variable dbg_count_check_Onnx_23_verConstant is never read.
Comment on lines +741 to +742
ONNX_OPERATOR_SET_SCHEMA(
ConstantOfShape,

Check notice

Code scanning / CodeQL

Unused static variable Note

Static variable dbg_count_check_Onnx_23_verConstantOfShape is never read.
Comment on lines +2472 to +2473
ONNX_OPERATOR_SET_SCHEMA(
Transpose,

Check notice

Code scanning / CodeQL

Unused static variable Note

Static variable dbg_count_check_Onnx_23_verTranspose is never read.
Comment on lines +3532 to +3533
ONNX_OPERATOR_SET_SCHEMA(
Squeeze,

Check notice

Code scanning / CodeQL

Unused static variable Note

Static variable dbg_count_check_Onnx_23_verSqueeze is never read.
Comment on lines +3899 to +3900
ONNX_OPERATOR_SET_SCHEMA(
Unsqueeze,

Check notice

Code scanning / CodeQL

Unused static variable Note

Static variable dbg_count_check_Onnx_23_verUnsqueeze is never read.
Comment on lines +4764 to +4765
ONNX_OPERATOR_SET_SCHEMA(
Identity,

Check notice

Code scanning / CodeQL

Unused static variable Note

Static variable dbg_count_check_Onnx_23_verIdentity is never read.
Comment on lines +5202 to +5203
ONNX_OPERATOR_SET_SCHEMA(
Pad,

Check notice

Code scanning / CodeQL

Unused static variable Note

Static variable dbg_count_check_Onnx_23_verPad is never read.
@justinchuby justinchuby requested a review from Copilot July 11, 2025 18:28
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enables the float8e8m0 data type for various tensor and quantization operators in opset 24, including casting, quantization/dequantization, and control‐flow and shape‐manipulation ops.

  • Registers new version‐conversion adapters and type restrictions for float8e8m0 across opset 23→24 and 24→23
  • Updates operator schemas (CastLike, QuantizeLinear, DequantizeLinear, etc.) to version 24, adding a round_mode attribute and expanding type constraints to IRv12
  • Refreshes documentation to include opset 24 versions and float8e8m0 support

Reviewed Changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated no comments.

Show a summary per file
File Description
version_converter/convert.h Registers CompatibleAdapter and TypeRestriction for new ops and float8e8m0
defs/tensor/old.cc Adds version 23 schemas for CastLike, Reshape, Shape, Size, Transpose, Squeeze, Unsqueeze
defs/tensor/defs.cc Updates those ops to version 24, expands type constraints to IRv12, adds round_mode
defs/quantization/old.cc Introduces float8e8m0 in QuantizeLinear ver23 schema
defs/quantization/defs.cc Upgrades QuantizeLinear & DequantizeLinear to ver24 and expands type constraints
defs/operator_sets.h Adds forward declarations and schema iteration for new ops in version 24
defs/nn/old.cc Adds Flatten ver23 schema
defs/nn/defs.cc Upgrades Flatten to ver24, expands type constraints
defs/generator/old.cc Adds Constant and ConstantOfShape ver23 schema
defs/generator/defs.cc Upgrades Constant/ConstantOfShape to ver24, expands types
defs/controlflow/old.cc Adds If, Loop, Scan ver23 schemas
defs/controlflow/defs.cc Upgrades If, Loop, Scan to ver24, updates control_flow_types_ir12
docs/Operators.md Updates operator index and version links to include opset 24
Comments suppressed due to low confidence (3)

onnx/defs/quantization/defs.cc:103

  • The new float8e8m0 type is added here; consider adding unit tests for QuantizeLinear and DequantizeLinear with float8e8m0 to verify correct behavior and inference.
            {"tensor(float)", "tensor(float16)", "tensor(bfloat16)", "tensor(int32)", "tensor(float8e8m0)"},

onnx/version_converter/convert.h:833

  • [nitpick] Consider refactoring the repeated registerAdapter calls into a loop over operator names to reduce duplication and simplify future maintenance.
    registerAdapter(std::make_unique<CompatibleAdapter>("CastLike", OpSetID(23), OpSetID(24)));

onnx/defs/tensor/old.cc:2531

  • [nitpick] This call duplicates the earlier propagateElemTypeFromInputToOutput invocation at the top of the function. Removing the duplicate could simplify the inference logic.
          propagateElemTypeFromInputToOutput(ctx, 0, 0);

@github-project-automation github-project-automation bot moved this from In progress to Reviewer approved in PR Tracker Jul 11, 2025
@justinchuby
Copy link
Member

Lgtm. Can you update tests for castlike? Would be good to have a second review

Signed-off-by: Yuan Yao <yuanyao@nvidia.com>
@yuanyao-nv
Copy link
Contributor Author

@justinchuby seems like there's a lint error caught in the pipeline. It's unclear to me what the error message is referring to and it's also not reproducible locally. Any thoughts?

@justinchuby
Copy link
Member

I don't see a lint error, just 404. Maybe retry?

@justinchuby
Copy link
Member

justinchuby commented Jul 11, 2025

One thing I forgot:

onnx/onnx/onnx.proto

Lines 696 to 697 in eef5187

// INT32, INT16, INT8, INT4, UINT16, UINT8, UINT4, BOOL, FLOAT16, BFLOAT16, FLOAT8E4M3FN, FLOAT8E4M3FNUZ, FLOAT8E5M2, FLOAT8E5M2FNUZ, FLOAT4E2M1
repeated int32 int32_data = 5 [packed = true];
this line should also be updated? Could you also check that to_array, from_array and make_tensor handle it when data is in int32?

Signed-off-by: Yuan Yao <yuanyao@nvidia.com>
@yuanyao-nv
Copy link
Contributor Author

One thing I forgot:

onnx/onnx/onnx.proto

Lines 696 to 697 in eef5187

// INT32, INT16, INT8, INT4, UINT16, UINT8, UINT4, BOOL, FLOAT16, BFLOAT16, FLOAT8E4M3FN, FLOAT8E4M3FNUZ, FLOAT8E5M2, FLOAT8E5M2FNUZ, FLOAT4E2M1
repeated int32 int32_data = 5 [packed = true];

this line should also be updated? Could you also check that to_array, from_array and make_tensor handle it when data is in int32?

Thanks. Updated the comment.
The helper functions have been tested in the previous PR.

@yuanyao-nv yuanyao-nv enabled auto-merge (squash) July 12, 2025 06:28
@yuanyao-nv yuanyao-nv closed this Jul 12, 2025
auto-merge was automatically disabled July 12, 2025 06:29

Pull request was closed

@github-project-automation github-project-automation bot moved this from Reviewer approved to Done in PR Tracker Jul 12, 2025
@yuanyao-nv yuanyao-nv reopened this Jul 12, 2025
@github-project-automation github-project-automation bot moved this from Done to In progress in PR Tracker Jul 12, 2025
@yuanyao-nv yuanyao-nv enabled auto-merge (squash) July 12, 2025 06:30
@yuanyao-nv yuanyao-nv merged commit 36afe4f into onnx:main Jul 12, 2025
62 of 71 checks passed
@github-project-automation github-project-automation bot moved this from In progress to Done in PR Tracker Jul 12, 2025
@justinchuby justinchuby added the topic: operator Issues related to ONNX operators label Jul 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: operator Issues related to ONNX operators
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants