You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The LayerNormalization operation is defined as a sequences of ops. In this sequence, the LayerNormalization inputs 'B' and 'Scales' are used, respectively, by Add and Mul, both ops that support Multi-Directional Broadcasting.
However, in the context of LayerNormalization, I believe only the Uni-Directional Broadcasting from B and Scales to their respective other operands (with shapes derived from input X) makes sense.
Does the community agree with this observation?
If yes, I recommend to add this clarification to the current op. Let me know if you need help for this.
In general, I would suggest to be more explicit with broadcasting rules when applicable.