Skip to content

[Feature request] Adding output_dtype attribute to QuantizeLinear #5943

@galagam

Description

@galagam

System information

Main top-of-tree.

What is the problem that this feature solves?

QuantizeLinear supports output types UINT8, INT8, UINT16, INT16, UINT4, INT4, FLOAT8*.
In order to specify any type other than the default UINT8, the user should provide a zero-point tensor. The output dtype is derived from the zero-point tensor. This leads to defining the zero-point tensor just to signal the output datatype.

Using the zero-point solely to specify the output data type poses several problems:

  1. Increased Model Size: The need to include a zero-point tensor, especially when dealing with block quantization, can lead to unnecessary inflation of the model size. This is because additional data_size/block_size zeros must be included, which do not contribute to the model's functionality but occupy storage and memory resources.
  2. Computational Overhead: For backends processing the QuantizeLinear operation, the presence of large zero-point tensors (filled with zeros) requires either checking the values of the zero-point tensor are all zeros, or performing the addition operation.
  3. Difficulty in Generating Non-standard Data Types: When exporting models from frameworks such as PyTorch, generating tensors for non-standard data types (e.g., FLOAT8) to serve as zero points is a challenge, limiting the accessibility of model quantization.

Alternatives considered

No response

Describe the feature

Add an optional output_dtype attribute to QuantizeLinear.
The output_dtype attribute will allow users to directly specify the desired output data type for the QuantizeLinear operation without the need to provide a zero_point tensor.
Supported data types will include UINT8, INT8, UINT16, INT16, UINT4, INT4, and FLOAT8, aligning with the current supported output types.
In case output_dtype is not supplied and zero_point is supplied - data type will be derived from zero_point.
In case neither output_dtype or zero_point are supplied, the default data type will be UINT8.
In case output_dtype and zero_point show conflicting data types - the model is invalid.

Will this influence the current api (Y/N)?

Yes
Adding an attribute to QuantizeLinear

Feature Area

Operators

Are you willing to contribute it (Y/N)

Yes

Notes

@xadupre

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions