-
Notifications
You must be signed in to change notification settings - Fork 955
Adding Batch L2 Normalization Layer that L2 normalizes rows of input Tensor #260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
will give in-line feedback. this has some common patterns that we dont want in nn. |
assert(input:dim() == 2, 'only mini-batch supported (2D tensor), got ' | ||
.. input:dim() .. 'D tensor instead') | ||
self.output:resizeAs(input) | ||
local norms = torch.cmul(input,input):sum(2):sqrt() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
torch.xxx(...) usually does a malloc for the result tensor.
mallocs during the training loop of the nn are usually quite bad for performance.
Another reason mallocs are bad is that they introduce a synchronize point in CUDA, so you essentially kill any multi-GPU code.
instead, consider having a buffer.
self.buffer = self.buffer or input.new()
self.norms = self.norms or input.new()
self.norms:sum(self.buffer:cmul(input, input), 2):sqrt()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks,
some of this was intentional on my part because I'm applying this in space-limited setting (I'm constantly running out of memory on GPU), so saving all of these cached intermediates (+ at every step of my LSTM) was costly. But if this is considered preferable I'll rewrite.
EDIT: I did not appreciate the point about multi-GPU code, interesting.
overall the barrier of contributing is lower to nnx (nn-experimental) which is here: https://github.com/clementfarabet/lua---nnx I made a bunch of comments, but hope they dont stop you from contributing in the future :) |
Thanks for comments! Some of these issues I was aware of and some of them were just silly mistakes (e.g. reshape) - sorry about that. I did a rewrite that should now be much more time-efficient (at cost of space efficiency (due to held caches) and readability), but it works. |
…rward pass would incorrectly compute the gradient because of inplace operations
I think it's missing the implementation when |
squashed the commits and pushed to master via 905ea8c |
yay!! By the way, suggested improvements going forward:
|
@karpathy, I would suggest
And, thank you for the time you took to help the Torch's community 😁 |
Some documentation would be nice. |
I totally missed that this PR doesn't have documentation. Indeed some documentation would be nice... |
I'll add docs. I have a patch for |
This layer L2 normalizes an
n x d
Tensor. I tested the implementation on both CPU/GPU and gradient checked it withjac
and also with my (other) custom code. I also compared to a slower version that uses a loop, as can be seen in this gist. (This batched version is quite a lot faster)Also, I have only worked with Torch for ~2 weeks so exercise caution :)