Skip to content

Conversation

jvmncs
Copy link
Contributor

@jvmncs jvmncs commented Jul 14, 2017

Implementation of Noisy Networks per #2024. The gist link from that issue is now obsolete; the forward pass no longer resamples the noise tensors each time, and I've added a method reset_noise to resample the noise tensors. Also, I used self.training to differentiate between train and eval passes. I ran some basic tests to make sure methods were functioning, but I still need to do more testing. Also, I'm not sure how to edit the docs. If someone can point me in the right direction for expectations on writing docs, I'd appreciate it.

@@ -12,16 +13,14 @@ class Linear(Module):
Args:
in_features: size of each input sample
out_features: size of each output sample
bias: If set to False, the layer will not learn an additive bias.
Default: True
bias: If set to False, the layer will not learn an additive bias. Default: True

This comment was marked as off-topic.

factorised: whether or not to use factorised noise.
Default: True
std_init: initialization constant for standard deviation component of
weights. If None, defaults to 0.017 for independent and 0.4 for

This comment was marked as off-topic.

self.std_init = 0.017
else:
self.std_init = std_init
self.reset_parameters(bias)

This comment was marked as off-topic.

self.bias_sigma.data.fill_(self.std_init)

def scale_noise(self, size):
x = torch.Tensor(size).normal_()

This comment was marked as off-topic.

self.weight_epsilon = Variable(epsilon_out.ger(epsilon_in))
self.bias_epsilon = Variable(self.scale_noise(self.out_features))
else:
self.weight_epsilon = Variable(torch.Tensor((self.out_features, self.in_features)).normal_())

This comment was marked as off-topic.

class NoisyLinear(Module):
"""Applies a noisy linear transformation to the incoming data:
:math:`y = (mu_w + sigma_w \cdot epsilon_w)x
+ mu_b + sigma_b \cdot epsilon_b`

This comment was marked as off-topic.

def __repr__(self):
return self.__class__.__name__ + ' (' \
+ str(self.in_features) + ' -> ' \
+ str(self.out_features) + ')'

This comment was marked as off-topic.

weight: the learnable weights of the module of shape
(out_features x in_features)
bias: the learnable bias of the module of shape (out_features)
Examples::

This comment was marked as off-topic.

self.bias_mu.data.uniform_(-mu_range, mu_range)
self.bias_sigma.data.fill_(self.std_init)

def scale_noise(self, size):

This comment was marked as off-topic.

@jvmncs
Copy link
Contributor Author

jvmncs commented Jul 14, 2017

@soumith is the method documentation at lines 145/6 in a satisfactory format? Can't find a module with a similar example.

@Kaixhin
Copy link
Contributor

Kaixhin commented Jul 14, 2017

Everything looks good to me now 👍 I'll leave Soumith to help with that bit of documentation and how to sort out testing.

@alykhantejani
Copy link
Contributor

@jvmancuso I think you just need to add some test cases to the list here: https://github.com/pytorch/pytorch/blob/master/test/test_nn.py#L3114

you can follow the style of Linear here: https://github.com/pytorch/pytorch/blob/master/test/common_nn.py#L28

@alykhantejani
Copy link
Contributor

@jvmancuso as for the docstring, you should prefix the """ with an r i.e.

r"""
   this is my docstring which is now treated as a raw string.
   This means escape characters like \ wont be converted and instead
   treated as a literal \
"""

class NoisyLinear(Module):
"""Applies a noisy linear transformation to the incoming data.
During training:
:math:`y = (mu_w + sigma_w \cdot epsilon_w)x

This comment was marked as off-topic.

This comment was marked as off-topic.

>>> print(output)
>>> print(output_new)
"""
def __init__(self, in_features, out_features, bias=True, factorised=True, std_init=None):

This comment was marked as off-topic.

This comment was marked as off-topic.

@jvmncs
Copy link
Contributor Author

jvmncs commented Jul 18, 2017

@alykhantejani pretty sure I fixed the math line in the docs at L123.

I'm not sure how a test case like Linear's will work in this instance, since the output of the layer will be different for each forward pass by definition. Specifically, there is no stable reference_fn to use, and I'm not sure how to test the functionality without a custom TestNN object that checks for AssertNotEqual instead of AssertEqual.

@alykhantejani
Copy link
Contributor

@jvmancuso You don't have to add the reference_fn field but adding the entry to the dict will make sure the jacobian checks are done (numerical vs analytical)

@Kaixhin
Copy link
Contributor

Kaixhin commented Jul 19, 2017

@alykhantejani Apart from the dict tests I think 2 custom tests would be good:

  • In training mode, show that output 1, given a fixed input, is reasonably different from output 2, given the same input, but after reset_noise has been called. There is already an assertNotEqual function available to use.
  • In evaluation mode, the output matches a linear layer with the same weights and biases.

It is pretty critical that the module shows these behaviours beyond simply passing the automatic differentiation checks.

@alykhantejani
Copy link
Contributor

@Kaixhin agreed, these can just be added as regular test functions in test_nn.py

@jvmncs
Copy link
Contributor Author

jvmncs commented Jul 29, 2017

Dug into these most recent failures. Question: what do I do about test_noncontig? It appears to be checking that using deepcopy to copy the layer doesn't change the gradient of the parameters. I think deepcopy might be resampling the noise somewhere, which would definitely trigger assertEqual to fail on parameter grads if that's the case. Can somebody else please take a look at this?

Also, no idea why test_Conv2d_backward_twice is failing. Nothing I've done changes the Conv2d module or that test case, and I don't see how any changes I've made would cause it to fail.

@Kaixhin @alykhantejani

@alykhantejani
Copy link
Contributor

@jvmancuso sorry I've been away on vacation. Will try to take a look at this, this week.

@@ -2344,62 +2274,6 @@ def test_bce_with_logits_gives_same_result_as_sigmoid_and_bce_loss(self):
weight = torch.rand(4)
self.assertEqual(nn.BCEWithLogitsLoss(weight)(output, target), nn.BCELoss(weight)(sigmoid(output), target))

target = Variable(torch.FloatTensor(4, 1).fill_(0))

This comment was marked as off-topic.

@alykhantejani
Copy link
Contributor

@jvmancuso The issue with test_noncontig is because the function tries to zero the gradients of the params here which calls this snippet of code which only zeros gradients for weight and bias and not the other params in this module (weight_mu etc.)

I'm not quite sure why _zero_grad_parameters explicitly names weight and bias, but perhaps this function can be replaced with a call to module. zero_grad (). Although there is a detach() call in _zero_grad_parameters, so if this is actually needed this function can instead loop through the modules parameters and manually zero the grads and detach.

@soumith @apaszke wdyt?

@jvmancuso in terms of the other test failing, try and pull in changes from upstream/master as you currently have merge conflicts anyway.

@jvmncs
Copy link
Contributor Author

jvmncs commented Nov 7, 2017

Hi all, it's been awhile, and I haven't forgotten about this. I'm doing an analysis of DeepMind's Noisy Nets versus OpenAI's Parameter Space Noise, and wanted to revisit this when that's done. I'll be implementing the OpenAI version shortly, and wanted to investigate including that in this PR. I'll close this for now and reopen when I've thought through that a bit more.

@jvmncs jvmncs closed this Nov 7, 2017
@Kaixhin
Copy link
Contributor

Kaixhin commented Nov 13, 2017

@jvmancuso for future reference, this doesn't work with CUDA. I think the best solution is to register weight_epsilon and bias_epsilon as buffers so that when the model is cast to CUDA they are cast as well - and then the generated noise needs to be copied over. You can see an example here.

@jvmncs
Copy link
Contributor Author

jvmncs commented Nov 13, 2017

@Kaixhin I had made a few changes to the code to accommodate for that but hadn't committed them. Your solution is more elegant though, I'll integrate it into my work. Thanks!

zou3519 pushed a commit to zou3519/pytorch that referenced this pull request Mar 30, 2018
houseroad added a commit to houseroad/pytorch that referenced this pull request Jul 25, 2019
…9a6052

Summary:
Previous import was 707064980b9825b8705b9d1c9aad34d8b022d5dd

Included changes:
- **[28ca699b](onnx/onnx@28ca699b)**: Member Company logo guidelines (pytorch#2196) <Prasanth Pulavarthi>
- **[47acb06a](onnx/onnx@47acb06a)**: remove link to outdated issue for contributions wanted (pytorch#2186) <Prasanth Pulavarthi>
- **[168519f6](onnx/onnx@168519f6)**: Create sigs.md (pytorch#2103) <Prasanth Pulavarthi>
- **[b9320746](onnx/onnx@b9320746)**: mintor format update (pytorch#2180) <Prasanth Pulavarthi>
- **[65b8e0f9](onnx/onnx@65b8e0f9)**: add more types support for Equal op (pytorch#2176) <Ke Zhang>
- **[dc5e62a9](onnx/onnx@dc5e62a9)**: Update AddNewOP document. (pytorch#2172) <Emad Barsoum>
- **[bae8b530](onnx/onnx@bae8b530)**: Add missing space (pytorch#2150) <Takeshi Watanabe>
- **[5952b7f5](onnx/onnx@5952b7f5)**: python api example typo fix (pytorch#2155) <LeicongLi>
- **[904cb842](onnx/onnx@904cb842)**: Fix errors in RoiAlign shape inference code (pytorch#2167) <G. Ramalingam>

Differential Revision: D16502373

fbshipit-source-id: 68b9479a30fc330d876947cb4ea8227848f576e3
IvanYashchuk pushed a commit to IvanYashchuk/pytorch that referenced this pull request Oct 25, 2022
jagadish-amd pushed a commit to jagadish-amd/pytorch that referenced this pull request May 15, 2025
Removing --no-deps and --no-index flags because the old setuptools was
installing the requirements of the torch wheel along with those of the
torchvision wheel when using the old setuptools version (<80.0.1) and in
versions greater than setuptools 80.0.1, the new setuptools does not
install the requirements of the torch wheel and torchvision wheel.
Resolves
https://ontrack-internal.amd.com/browse/SWDEV-531011

Validation:

http://rocm-ci.amd.com/job/framework-pytorch-2.6-ub22-py3.10-ci_rel-6.4-preview/60/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants