-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
System information
Main top-of-tree.
What is the problem that this feature solves?
QuantizeLinear supports output types UINT8, INT8, UINT16, INT16, UINT4, INT4, FLOAT8*.
In order to specify any type other than the default UINT8, the user should provide a zero-point tensor. The output dtype is derived from the zero-point tensor. This leads to defining the zero-point tensor just to signal the output datatype.
Using the zero-point solely to specify the output data type poses several problems:
- Increased Model Size: The need to include a zero-point tensor, especially when dealing with block quantization, can lead to unnecessary inflation of the model size. This is because additional data_size/block_size zeros must be included, which do not contribute to the model's functionality but occupy storage and memory resources.
- Computational Overhead: For backends processing the QuantizeLinear operation, the presence of large zero-point tensors (filled with zeros) requires either checking the values of the zero-point tensor are all zeros, or performing the addition operation.
- Difficulty in Generating Non-standard Data Types: When exporting models from frameworks such as PyTorch, generating tensors for non-standard data types (e.g., FLOAT8) to serve as zero points is a challenge, limiting the accessibility of model quantization.
Alternatives considered
No response
Describe the feature
Add an optional output_dtype attribute to QuantizeLinear.
The output_dtype attribute will allow users to directly specify the desired output data type for the QuantizeLinear operation without the need to provide a zero_point tensor.
Supported data types will include UINT8, INT8, UINT16, INT16, UINT4, INT4, and FLOAT8, aligning with the current supported output types.
In case output_dtype is not supplied and zero_point is supplied - data type will be derived from zero_point.
In case neither output_dtype or zero_point are supplied, the default data type will be UINT8.
In case output_dtype and zero_point show conflicting data types - the model is invalid.
Will this influence the current api (Y/N)?
Yes
Adding an attribute to QuantizeLinear
Feature Area
Operators
Are you willing to contribute it (Y/N)
Yes