Fix shape inference for DequantizeLinear #5709

xadupre · 2023-10-27T10:48:22Z

Description

Fix shape inference for types float16 and bfloat16 for operator DequantizeLinear.

Motivation and Context

Shape inference is wrong for DequanzeLinear-19 as reported in issue #5704.

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

codecov · 2023-10-27T10:55:47Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Files	Coverage Δ
onnx/test/shape_inference_test.py	`99.49% <100.00%> (+<0.01%)`	⬆️

📢 Thoughts on this report? Let us know!.

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

### Description Fix shape inference for types float16 and bfloat16 for operator DequantizeLinear. ### Motivation and Context Shape inference is wrong for DequanzeLinear-19 as reported in issue onnx#5704. --------- Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

…hts (#18043) ### Description Whenever a node QuantizeLinear or DequantizeLinear, the type of the weights before being quantize must be known to create the scale with the expected type. Another option would be to add many operator CastLike but that would push the burden to onnxruntime optimizer. The PR tries to avoid changing the signature. To do so, it modified the scale computation to use a numpy array to store the result and not a python float. The numpy array must be of the same type than the weights to quantize. The PR adds many `assert` to check the type of the scale is not a python type or a float64. This was added to make sure all the code follows the same logic. These lines were kept for the first review. DequantizeLinear, QuantizeLinear cannot be tested with onnx==1.15. PR onnx/onnx#5709 is missing to fix shape inference. PR onnx/onnx#5473) is missing to support QLinearMatMul with float 16. That explains why some tests are disabled with float 16. ### Motivation and Context The current quantization tool assumes every weight is float 32. For large models such as LLAMA, it is usually float 16. The quantization needs to quantize such weights.

Fix shape inference for DequantizeLinear

aa7eeb9

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

lint

b0f7570

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

xadupre marked this pull request as ready for review October 27, 2023 14:25

xadupre requested review from a team as code owners October 27, 2023 14:25

justinchuby approved these changes Oct 27, 2023

View reviewed changes

xadupre enabled auto-merge October 27, 2023 14:36

xadupre mentioned this pull request Oct 27, 2023

Quantization tool: support float 8 with MatMul, support float 16 weights microsoft/onnxruntime#18043

Merged

gramalingam approved these changes Oct 30, 2023

View reviewed changes

xadupre added this pull request to the merge queue Oct 30, 2023

Merged via the queue into onnx:main with commit b2ee94b Oct 30, 2023

xadupre deleted the dqshape branch October 30, 2023 17:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix shape inference for DequantizeLinear #5709

Fix shape inference for DequantizeLinear #5709

Uh oh!

xadupre commented Oct 27, 2023

Uh oh!

codecov bot commented Oct 27, 2023 •

edited

Loading

Uh oh!

Uh oh!

Fix shape inference for DequantizeLinear #5709

Fix shape inference for DequantizeLinear #5709

Uh oh!

Conversation

xadupre commented Oct 27, 2023

Description

Motivation and Context

Uh oh!

codecov bot commented Oct 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

codecov bot commented Oct 27, 2023 •

edited

Loading