Skip to content

[Feature Request] Synchronizing scalers of multiple optimizers #39

@alphaGem

Description

@alphaGem

Sometimes, we need to use multiple optimizers for different parameters so that we can turn on and off the optimization of different parameters easily.

However, in the current implementation of BMTrain, every optimizer has its own scale. To make the gradient correct, either I need to put all parameters into one optimizer, or I need to call backward for multiple times for each optimizer with their own scaler (and I'm not sure if this works; not tried yet).

So I request for a utility that synchronizes the scalers of multiple optimizers, which takes the loss and a list of optimizers as parameters and works like this roughly as far I can see:

... # initialize
for optimizer in optimizers:
  if optimizer.scale < min_scale:
    min_scale = optimizer.scale
for optimizer in optimizers:
  optimizer.scale = min_scale
loss = loss * min_scale  ... # scale the loss

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions