See https://github.com/pytorch/pytorch/issues/1601 for previous discussion on [layer normalization](https://arxiv.org/abs/1607.06450).