Skip to content

Conversation

leo0519
Copy link
Collaborator

@leo0519 leo0519 commented Nov 14, 2023

PR types

Bug fixes

PR changes

Others

Description

This pull request adds a new pass AddQuantDequantForResidual in quantization_pass.py.
Through this pass, quant_aware could insert QDQ nodes for residual connections to ensure that INT8 inference runs entirely under low precision. Otherwise, some kernels may have floating-point precision and intermediate tensors.
This PR is an example for issue The model quantized by QAT API should have QDQ nodes before skip connection.

yghstill
yghstill previously approved these changes Nov 23, 2023
@leo0519
Copy link
Collaborator Author

leo0519 commented Nov 23, 2023

This PR only supports distributed optimizer to insert QDQ node before skip-connection, but this should be implemented in QAT API (ex. quant_aware or quantization_pass).

I will provide a complete version for this.

Marked as draft.

@leo0519 leo0519 marked this pull request as draft November 23, 2023 03:12
@leo0519 leo0519 marked this pull request as ready for review November 29, 2023 01:45
@leo0519 leo0519 changed the title Add QDQ into skip-connection in qat meta optimizer Add a pass to insert QDQ nodes before residual connection Nov 29, 2023
@AdamzNV AdamzNV requested a review from yghstill November 30, 2023 02:38
@wanghaoshuang wanghaoshuang merged commit de9d407 into PaddlePaddle:develop Dec 4, 2023
SigureMo pushed a commit to gouzil/Paddle that referenced this pull request Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants