[Feature request] Add support for Int4 data-type

### System information

onnx v1.16.0
main top-of-tree

### What is the problem that this feature solves?

LLMs are dominating the DL research and development today. Recent networks require 10s of GBs, and the main inference bottleneck is memory access (capacity and bandwidth). Recent papers are showing promising results using sub-byte data types, specifically weight-only quantization using int4.
The motivation behind weight-only quantization is improving performance (larger batches improve MMA utilization; smaller memory BW benefits memory-bound generation) as well as enabling large models that would otherwise not be able to execute on a single GPU.
By quantizing data to 4 bits, we can both reduce the model size and accelerate memory-bound inference use cases significantly.

### Alternatives considered

N/A

### Describe the feature

Every two int4 elements (i_0, i_1) will packed into a single uint8, as follows:
buffer =  i_0 << 4 | i_1 & 0x0F

Tensors with an odd number of elements are not supported.

- Add data-type TensorProto.INT4
- Add helper functions to pack and unpack int4
- Add support in QuantizeLinear, DequantizeLinear and some shape ops

### Will this influence the current api (Y/N)?

Yes
Additional data type available
Adding int4 to a subset of the operators, including QuantizeLinear, DequantizeLinear (Optionally: Shape, Size, Transpose, Reshape, Constant).

### Feature Area

data-types, operators

### Are you willing to contribute it (Y/N)

Yes

### Notes

Relevant papers using int4 quantization for LLMs:
[AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration](https://arxiv.org/abs/2306.00978)
[GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers](https://arxiv.org/abs/2210.17323)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature request] Add support for Int4 data-type #5776

System information

What is the problem that this feature solves?

Alternatives considered

Describe the feature

Will this influence the current api (Y/N)?

Feature Area

Are you willing to contribute it (Y/N)

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature request] Add support for Int4 data-type #5776

Description

System information

What is the problem that this feature solves?

Alternatives considered

Describe the feature

Will this influence the current api (Y/N)?

Feature Area

Are you willing to contribute it (Y/N)

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions