Noisy layer #2103

jvmncs · 2017-07-14T16:01:24Z

Implementation of Noisy Networks per #2024. The gist link from that issue is now obsolete; the forward pass no longer resamples the noise tensors each time, and I've added a method reset_noise to resample the noise tensors. Also, I used self.training to differentiate between train and eval passes. I ran some basic tests to make sure methods were functioning, but I still need to do more testing. Also, I'm not sure how to edit the docs. If someone can point me in the right direction for expectations on writing docs, I'd appreciate it.

torch/nn/modules/linear.py

@@ -12,16 +13,14 @@ class Linear(Module):
    Args:
        in_features: size of each input sample
        out_features: size of each output sample
-        bias: If set to False, the layer will not learn an additive bias.
-            Default: True
+        bias: If set to False, the layer will not learn an additive bias. Default: True


torch/nn/modules/linear.py

+        factorised: whether or not to use factorised noise.
+            Default: True
+        std_init: initialization constant for standard deviation component of
+            weights. If None, defaults to 0.017 for independent and 0.4 for


torch/nn/modules/linear.py

+                self.std_init = 0.017
+        else:
+            self.std_init = std_init
+        self.reset_parameters(bias)


torch/nn/modules/linear.py

+                self.bias_sigma.data.fill_(self.std_init)
+
+    def scale_noise(self, size):
+        x = torch.Tensor(size).normal_()


torch/nn/modules/linear.py

+            self.weight_epsilon = Variable(epsilon_out.ger(epsilon_in))
+            self.bias_epsilon = Variable(self.scale_noise(self.out_features))
+        else:
+            self.weight_epsilon = Variable(torch.Tensor((self.out_features, self.in_features)).normal_())


torch/nn/modules/linear.py

+class NoisyLinear(Module):
+    """Applies a noisy linear transformation to the incoming data:
+        :math:`y = (mu_w + sigma_w \cdot epsilon_w)x
+            + mu_b + sigma_b \cdot epsilon_b`


torch/nn/modules/linear.py

+    def __repr__(self):
+        return self.__class__.__name__ + ' (' \
+            + str(self.in_features) + ' -> ' \
+            + str(self.out_features) + ')'


torch/nn/modules/linear.py

+        weight: the learnable weights of the module of shape
+            (out_features x in_features)
+        bias:   the learnable bias of the module of shape (out_features)
+    Examples::


torch/nn/modules/linear.py

+                self.bias_mu.data.uniform_(-mu_range, mu_range)
+                self.bias_sigma.data.fill_(self.std_init)
+
+    def scale_noise(self, size):


jvmncs · 2017-07-14T19:11:53Z

@soumith is the method documentation at lines 145/6 in a satisfactory format? Can't find a module with a similar example.

Kaixhin · 2017-07-14T19:18:55Z

Everything looks good to me now 👍 I'll leave Soumith to help with that bit of documentation and how to sort out testing.

alykhantejani · 2017-07-14T22:53:29Z

@jvmancuso I think you just need to add some test cases to the list here: https://github.com/pytorch/pytorch/blob/master/test/test_nn.py#L3114

you can follow the style of Linear here: https://github.com/pytorch/pytorch/blob/master/test/common_nn.py#L28

alykhantejani · 2017-07-14T22:57:59Z

@jvmancuso as for the docstring, you should prefix the """ with an r i.e.

r"""
   this is my docstring which is now treated as a raw string.
   This means escape characters like \ wont be converted and instead
   treated as a literal \
"""

torch/nn/modules/linear.py

+class NoisyLinear(Module):
+    """Applies a noisy linear transformation to the incoming data.
+    During training:
+        :math:`y = (mu_w + sigma_w \cdot epsilon_w)x


torch/nn/modules/linear.py

+        >>> print(output)
+        >>> print(output_new)
+    """
+    def __init__(self, in_features, out_features, bias=True, factorised=True, std_init=None):


jvmncs · 2017-07-18T15:00:59Z

@alykhantejani pretty sure I fixed the math line in the docs at L123.

I'm not sure how a test case like Linear's will work in this instance, since the output of the layer will be different for each forward pass by definition. Specifically, there is no stable reference_fn to use, and I'm not sure how to test the functionality without a custom TestNN object that checks for AssertNotEqual instead of AssertEqual.

alykhantejani · 2017-07-19T10:57:11Z

@jvmancuso You don't have to add the reference_fn field but adding the entry to the dict will make sure the jacobian checks are done (numerical vs analytical)

Kaixhin · 2017-07-19T11:57:20Z

@alykhantejani Apart from the dict tests I think 2 custom tests would be good:

In training mode, show that output 1, given a fixed input, is reasonably different from output 2, given the same input, but after reset_noise has been called. There is already an assertNotEqual function available to use.
In evaluation mode, the output matches a linear layer with the same weights and biases.

It is pretty critical that the module shows these behaviours beyond simply passing the automatic differentiation checks.

alykhantejani · 2017-07-19T13:42:45Z

@Kaixhin agreed, these can just be added as regular test functions in test_nn.py

jvmncs · 2017-07-29T21:01:34Z

Dug into these most recent failures. Question: what do I do about test_noncontig? It appears to be checking that using deepcopy to copy the layer doesn't change the gradient of the parameters. I think deepcopy might be resampling the noise somewhere, which would definitely trigger assertEqual to fail on parameter grads if that's the case. Can somebody else please take a look at this?

Also, no idea why test_Conv2d_backward_twice is failing. Nothing I've done changes the Conv2d module or that test case, and I don't see how any changes I've made would cause it to fail.

@Kaixhin @alykhantejani

alykhantejani · 2017-08-07T11:09:38Z

@jvmancuso sorry I've been away on vacation. Will try to take a look at this, this week.

test/test_nn.py

@@ -2344,62 +2274,6 @@ def test_bce_with_logits_gives_same_result_as_sigmoid_and_bce_loss(self):
        weight = torch.rand(4)
        self.assertEqual(nn.BCEWithLogitsLoss(weight)(output, target), nn.BCELoss(weight)(sigmoid(output), target))

-        target = Variable(torch.FloatTensor(4, 1).fill_(0))


alykhantejani · 2017-08-08T10:05:21Z

@jvmancuso The issue with test_noncontig is because the function tries to zero the gradients of the params here which calls this snippet of code which only zeros gradients for weight and bias and not the other params in this module (weight_mu etc.)

I'm not quite sure why _zero_grad_parameters explicitly names weight and bias, but perhaps this function can be replaced with a call to module. zero_grad (). Although there is a detach() call in _zero_grad_parameters, so if this is actually needed this function can instead loop through the modules parameters and manually zero the grads and detach.

@soumith @apaszke wdyt?

@jvmancuso in terms of the other test failing, try and pull in changes from upstream/master as you currently have merge conflicts anyway.

jvmncs · 2017-11-07T20:11:02Z

Hi all, it's been awhile, and I haven't forgotten about this. I'm doing an analysis of DeepMind's Noisy Nets versus OpenAI's Parameter Space Noise, and wanted to revisit this when that's done. I'll be implementing the OpenAI version shortly, and wanted to investigate including that in this PR. I'll close this for now and reopen when I've thought through that a bit more.

Kaixhin · 2017-11-13T11:26:35Z

@jvmancuso for future reference, this doesn't work with CUDA. I think the best solution is to register weight_epsilon and bias_epsilon as buffers so that when the model is cast to CUDA they are cast as well - and then the generated noise needs to be copied over. You can see an example here.

jvmncs · 2017-11-13T17:21:39Z

@Kaixhin I had made a few changes to the code to accommodate for that but hadn't committed them. Your solution is more elegant though, I'll integrate it into my work. Thanks!

…9a6052 Summary: Previous import was 707064980b9825b8705b9d1c9aad34d8b022d5dd Included changes: - **[28ca699b](onnx/onnx@28ca699b)**: Member Company logo guidelines (pytorch#2196) <Prasanth Pulavarthi> - **[47acb06a](onnx/onnx@47acb06a)**: remove link to outdated issue for contributions wanted (pytorch#2186) <Prasanth Pulavarthi> - **[168519f6](onnx/onnx@168519f6)**: Create sigs.md (pytorch#2103) <Prasanth Pulavarthi> - **[b9320746](onnx/onnx@b9320746)**: mintor format update (pytorch#2180) <Prasanth Pulavarthi> - **[65b8e0f9](onnx/onnx@65b8e0f9)**: add more types support for Equal op (pytorch#2176) <Ke Zhang> - **[dc5e62a9](onnx/onnx@dc5e62a9)**: Update AddNewOP document. (pytorch#2172) <Emad Barsoum> - **[bae8b530](onnx/onnx@bae8b530)**: Add missing space (pytorch#2150) <Takeshi Watanabe> - **[5952b7f5](onnx/onnx@5952b7f5)**: python api example typo fix (pytorch#2155) <LeicongLi> - **[904cb842](onnx/onnx@904cb842)**: Fix errors in RoiAlign shape inference code (pytorch#2167) <G. Ramalingam> Differential Revision: D16502373 fbshipit-source-id: 68b9479a30fc330d876947cb4ea8227848f576e3

pytorch#2103)

Removing --no-deps and --no-index flags because the old setuptools was installing the requirements of the torch wheel along with those of the torchvision wheel when using the old setuptools version (<80.0.1) and in versions greater than setuptools 80.0.1, the new setuptools does not install the requirements of the torch wheel and torchvision wheel. Resolves https://ontrack-internal.amd.com/browse/SWDEV-531011 Validation: http://rocm-ci.amd.com/job/framework-pytorch-2.6-ub22-py3.10-ci_rel-6.4-preview/60/

jvmancuso added 5 commits July 9, 2017 01:39

first attempt at NoisyLinear layer

6628957

adding much-needed functionality to NoisyLinear implementation

db6404c

Merge remote-tracking branch 'upstream/master' into noisy_layer

5d1bed5

cleaning

5f06250

had the wrong linear.py file

2c32be6