Skip to content

Adding Batch L2 Normalization Layer that L2 normalizes rows of input Tensor #260

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

karpathy
Copy link
Contributor

@karpathy karpathy commented May 5, 2015

This layer L2 normalizes an n x d Tensor. I tested the implementation on both CPU/GPU and gradient checked it with jac and also with my (other) custom code. I also compared to a slower version that uses a loop, as can be seen in this gist. (This batched version is quite a lot faster)

Also, I have only worked with Torch for ~2 weeks so exercise caution :)

@soumith
Copy link
Member

soumith commented May 6, 2015

will give in-line feedback. this has some common patterns that we dont want in nn.

assert(input:dim() == 2, 'only mini-batch supported (2D tensor), got '
.. input:dim() .. 'D tensor instead')
self.output:resizeAs(input)
local norms = torch.cmul(input,input):sum(2):sqrt()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

torch.xxx(...) usually does a malloc for the result tensor.
mallocs during the training loop of the nn are usually quite bad for performance.
Another reason mallocs are bad is that they introduce a synchronize point in CUDA, so you essentially kill any multi-GPU code.

instead, consider having a buffer.

self.buffer = self.buffer or input.new()
self.norms = self.norms or input.new()
self.norms:sum(self.buffer:cmul(input, input), 2):sqrt()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks,

some of this was intentional on my part because I'm applying this in space-limited setting (I'm constantly running out of memory on GPU), so saving all of these cached intermediates (+ at every step of my LSTM) was costly. But if this is considered preferable I'll rewrite.

EDIT: I did not appreciate the point about multi-GPU code, interesting.

@soumith
Copy link
Member

soumith commented May 6, 2015

overall the barrier of contributing is lower to nnx (nn-experimental) which is here: https://github.com/clementfarabet/lua---nnx

I made a bunch of comments, but hope they dont stop you from contributing in the future :)

@karpathy
Copy link
Contributor Author

karpathy commented May 6, 2015

Thanks for comments! Some of these issues I was aware of and some of them were just silly mistakes (e.g. reshape) - sorry about that. I did a rewrite that should now be much more time-efficient (at cost of space efficiency (due to held caches) and readability), but it works.

…rward pass would incorrectly compute the gradient because of inplace operations
@Atcold
Copy link
Contributor

Atcold commented May 6, 2015

I think it's missing the implementation when input:dim() == 1. If a user sends a non-batched input, the module should output the normalised single vector (see this implementation, which uses an approximate gradient computation but works for single samples as well).

@soumith
Copy link
Member

soumith commented May 13, 2015

squashed the commits and pushed to master via 905ea8c

@soumith soumith closed this May 13, 2015
@karpathy
Copy link
Contributor Author

yay!! By the way, suggested improvements going forward:

  • Incorporate a smoothing term to prevent division by zero (?) Or perhaps this should this be optional? Not clear.
  • Allow 1-dimensional inputs as suggested by Atcold.

@Atcold
Copy link
Contributor

Atcold commented May 16, 2015

@karpathy, I would suggest

  • a smoothing by default of 1e-5 like shown here. If the user wants to set something else, then let her do it by an optional parameter [eps] in the module initialisation;
  • if you don't want to write new code, 1-dimensional inputs could be viewed as 1 x length. So if dim(input) == 1 then newInput = input:view(1,-1);
  • to add the corresponding documentation, so we avoid having ghost features.

And, thank you for the time you took to help the Torch's community 😁

@nicholas-leonard
Copy link
Member

Some documentation would be nice.

@soumith
Copy link
Member

soumith commented May 18, 2015

I totally missed that this PR doesn't have documentation. Indeed some documentation would be nice...

@karpathy
Copy link
Contributor Author

I'll add docs. I have a patch for eps and I tried to patch the support for 1D case but that ended up taking an hour yesterday, it's not as trivial as changing the view with 1 line of code. I'll work on it.

@fmassa fmassa mentioned this pull request Aug 1, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants