Optimizers memory usage

There's been few comments about memory consumption in the R-package. With large models, the memory footprint can rapidly burst the available memory unless a manual gc() is added in the training loop. 

The problematic appears to be the weights update performed by the optimizers: https://github.com/apache/incubator-mxnet/blob/master/R-package/R/model.R#L221

Memory behaves as if the temporary nd.arrays were kept in memory. 

Looked at how the python optimizers were designed and took a first attempt at using the `mx.nd.sgd.update` type of operators. 
It seemed to help, but looking at https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/optimizer.py#L434, I confused at how the state of optimizers such as momentum or adam and cie is updated. I can't see where the staste is updated other than for the initialization. 

Also, tried to do a direct mutation of the weight using the out parameters of the `mx.nd.sgd.update` to point to the executor ref.arg.arrays, but it got the following message after a first update: 

```
 Error in mx.nd.sgd.update(weight = weight, out = out, grad = grad, lr = learning.rate,  : 
  ./ndarray.h:87: RCheck failed: ptr_->writable && !ptr_->moved Passing a read only NDArray to mutate function 
```
I'm still missing some pieces to figure out a proper way of fixing the memory consumtion of the optimizers. Any help would be much welcome!

@thirdwing 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimizers memory usage #10928

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimizers memory usage #10928

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions