-
Notifications
You must be signed in to change notification settings - Fork 451
Description
Describe the bug
I have taken these representations from TensorRT QAT Presentation
Figure(1)
As shown in Figure(1) above, I added QuantizeAndDequantizeV2
nodes before conv2d
op and for conv2d
kernel in my model.
But after converting it with tf2onnx, I don't seem to find the QuantizeLinear
and DequantizeLinear
nodes for the conv2d
kernel, but as shown in Figure(2) below, I was expecting tf2onnx to keep them and not fold them into constants.
Figure(2)
Figure(3)
Tensorboard visualization of my model, showing QuantizeAndDequantizeV2
ops for both input and weights.
Figure(4)
Netron visualization of onnx model, that shows QuantizeLinear
and DequantizeLinear
nodes only for conv2d
input.
It seems logical to fold the QuantizeLinear
and DequantizeLinear
nodes for weights into a constant as described here, #1394. But looking at Figure(5) below, it looks like TensorRT requires the QuantizeLinear
and DequantizeLinear
nodes for both conv2d
input and weights!
Urgency
Blocked use-case, TensorFlow2.x QAT -> ONNX -> TensorRT
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
- Tensorflow Version: 2.6
- Python version: 3.8
To Reproduce
Adding QuantizeAndDequantizeV2
into TF2.x graphs is not trivial and hence it is difficult for me to add all the code here.