-
Notifications
You must be signed in to change notification settings - Fork 81
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Sometimes, we need to use multiple optimizers for different parameters so that we can turn on and off the optimization of different parameters easily.
However, in the current implementation of BMTrain, every optimizer has its own scale. To make the gradient correct, either I need to put all parameters into one optimizer, or I need to call backward for multiple times for each optimizer with their own scaler (and I'm not sure if this works; not tried yet).
So I request for a utility that synchronizes the scalers of multiple optimizers, which takes the loss and a list of optimizers as parameters and works like this roughly as far I can see:
... # initialize
for optimizer in optimizers:
if optimizer.scale < min_scale:
min_scale = optimizer.scale
for optimizer in optimizers:
optimizer.scale = min_scale
loss = loss * min_scale ... # scale the loss
a710128
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request