[Feature Request] Synchronizing scalers of multiple optimizers

Sometimes, we need to use multiple optimizers for different parameters so that we can turn on and off the optimization of different parameters easily.

However, in the current implementation of BMTrain, every optimizer has its own scale. To make the gradient correct, either I need to put all parameters into one optimizer, or I need to call backward for multiple times for each optimizer with their own scaler (and I'm not sure if this works; not tried yet).

So I request for a utility that synchronizes the scalers of multiple optimizers,  which takes the loss and a list of optimizers as parameters and works like this roughly as far I can see:

```
... # initialize
for optimizer in optimizers:
  if optimizer.scale < min_scale:
    min_scale = optimizer.scale
for optimizer in optimizers:
  optimizer.scale = min_scale
loss = loss * min_scale  ... # scale the loss
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] Synchronizing scalers of multiple optimizers #39

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Synchronizing scalers of multiple optimizers #39

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions