[Feature request] Adding output_dtype attribute to QuantizeLinear

### System information

Main top-of-tree.

### What is the problem that this feature solves?

QuantizeLinear supports output types UINT8, INT8, UINT16, INT16, UINT4, INT4, FLOAT8*.
In order to specify any type other than the default UINT8, the user should provide a zero-point tensor. The output dtype is derived from the zero-point tensor. This leads to defining the zero-point tensor just to signal the output datatype.

Using the zero-point solely to specify the output data type poses several problems:

1. Increased Model Size: The need to include a zero-point tensor, especially when dealing with block quantization, can lead to unnecessary inflation of the model size. This is because additional data_size/block_size zeros must be included, which do not contribute to the model's functionality but occupy storage and memory resources.
2. Computational Overhead: For backends processing the QuantizeLinear operation, the presence of large zero-point tensors (filled with zeros) requires either checking the values of the zero-point tensor are all zeros, or performing the addition operation.
3. Difficulty in Generating Non-standard Data Types: When exporting models from frameworks such as PyTorch, generating tensors for non-standard data types (e.g., FLOAT8) to serve as zero points is a challenge, limiting the accessibility of model quantization.

### Alternatives considered

_No response_

### Describe the feature

Add an optional output_dtype attribute to QuantizeLinear.
The output_dtype attribute will allow users to directly specify the desired output data type for the QuantizeLinear operation without the need to provide a zero_point tensor.
Supported data types will include UINT8, INT8, UINT16, INT16, UINT4, INT4, and FLOAT8, aligning with the current supported output types.
In case output_dtype is not supplied and zero_point is supplied - data type will be derived from zero_point.
In case neither output_dtype or zero_point are supplied, the default data type will be UINT8. 
In case output_dtype and zero_point show conflicting data types - the model is invalid.

### Will this influence the current api (Y/N)?

Yes
Adding an attribute to QuantizeLinear

### Feature Area

Operators

### Are you willing to contribute it (Y/N)

Yes

### Notes

@xadupre 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature request] Adding output_dtype attribute to QuantizeLinear #5943

System information

What is the problem that this feature solves?

Alternatives considered

Describe the feature

Will this influence the current api (Y/N)?

Feature Area

Are you willing to contribute it (Y/N)

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature request] Adding output_dtype attribute to QuantizeLinear #5943

Description

System information

What is the problem that this feature solves?

Alternatives considered

Describe the feature

Will this influence the current api (Y/N)?

Feature Area

Are you willing to contribute it (Y/N)

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions