-
Notifications
You must be signed in to change notification settings - Fork 24.9k
Description
Currently, direct quantization of a torch.nn.MultiheadAttention module is not supported via torch.quantization.quantize_dynamic()
- a conversion must be performed first (reference). However, an erroneous attempt to quantize directly does not produce an error - instead, there's a silent failure to quantize certain layers, but the returned model is callable. There's no overt sign that something has gone wrong. See #58727 for full details.
A recent cleanup of torch.nn
introduced a downstream error when scripting such an improperly quantized module, by rationalizing module names in a way that allowed the problematic quantization to go forward. The error is a vanilla TorchScript type error due to a naming collision and has no clues that would lead one back to the original problem without nontrivial sleuthing. Also, the error was reported by users, meaning the quantization API is in fact being inadvertently misused in this way. Repro used to track down the issue here, derived from a bug report.
In the short term we've reinstated the name difference that defeats quantization when the API is used improperly, restoring the original behavior. But a proper fix would be for quantize_dynamic
(or something in the quantization API) to give the caller signal that an unsupported model has been specified.