You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
There's been few comments about memory consumption in the R-package. With large models, the memory footprint can rapidly burst the available memory unless a manual gc() is added in the training loop.
Memory behaves as if the temporary nd.arrays were kept in memory.
Looked at how the python optimizers were designed and took a first attempt at using the mx.nd.sgd.update type of operators.
It seemed to help, but looking at https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/optimizer.py#L434, I confused at how the state of optimizers such as momentum or adam and cie is updated. I can't see where the staste is updated other than for the initialization.
Also, tried to do a direct mutation of the weight using the out parameters of the mx.nd.sgd.update to point to the executor ref.arg.arrays, but it got the following message after a first update:
Error in mx.nd.sgd.update(weight = weight, out = out, grad = grad, lr = learning.rate, :
./ndarray.h:87: RCheck failed: ptr_->writable && !ptr_->moved Passing a read only NDArray to mutate function
I'm still missing some pieces to figure out a proper way of fixing the memory consumtion of the optimizers. Any help would be much welcome!