Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

Optimizers memory usage #10928

@jeremiedb

Description

@jeremiedb

There's been few comments about memory consumption in the R-package. With large models, the memory footprint can rapidly burst the available memory unless a manual gc() is added in the training loop.

The problematic appears to be the weights update performed by the optimizers: https://github.com/apache/incubator-mxnet/blob/master/R-package/R/model.R#L221

Memory behaves as if the temporary nd.arrays were kept in memory.

Looked at how the python optimizers were designed and took a first attempt at using the mx.nd.sgd.update type of operators.
It seemed to help, but looking at https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/optimizer.py#L434, I confused at how the state of optimizers such as momentum or adam and cie is updated. I can't see where the staste is updated other than for the initialization.

Also, tried to do a direct mutation of the weight using the out parameters of the mx.nd.sgd.update to point to the executor ref.arg.arrays, but it got the following message after a first update:

 Error in mx.nd.sgd.update(weight = weight, out = out, grad = grad, lr = learning.rate,  : 
  ./ndarray.h:87: RCheck failed: ptr_->writable && !ptr_->moved Passing a read only NDArray to mutate function 

I'm still missing some pieces to figure out a proper way of fixing the memory consumtion of the optimizers. Any help would be much welcome!

@thirdwing

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions