Improper use of quantization API for MHA should fail fast

Currently, direct quantization of a torch.nn.MultiheadAttention module is not supported via `torch.quantization.quantize_dynamic()` - a conversion must be performed first ([reference](https://gist.github.com/bhosmer/ef517d0774f2f10336b8140116fd6b62)). However, an erroneous attempt to quantize directly does not produce an error - instead, there's a silent failure to quantize certain layers, but the returned model is callable. There's no overt sign that something has gone wrong. See #58727 for full details.

A recent cleanup of `torch.nn` introduced a downstream error when scripting such an improperly quantized module, by rationalizing module names in a way that allowed the problematic quantization to go forward. The error is a vanilla TorchScript type error due to a naming collision and has no clues that would lead one back to the original problem without nontrivial sleuthing. Also, the error was reported by users, meaning the quantization API is in fact being inadvertently misused in this way. [Repro used to track down the issue here, derived from a bug report.](https://gist.github.com/bhosmer/ef517d0774f2f10336b8140116fd6b62)

In the short term we've reinstated the name difference that defeats quantization when the API is used improperly, restoring the original behavior. But a proper fix would be for `quantize_dynamic` (or something in the quantization API) to give the caller signal that an unsupported model has been specified.

cc @jerryzh168 @jianyuh @raghuramank100 @jamesr66a @vkuzo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improper use of quantization API for MHA should fail fast #58969

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improper use of quantization API for MHA should fail fast #58969

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions