-
Notifications
You must be signed in to change notification settings - Fork 617
Closed
Labels
Description
Describe the feature and the current behavior/state.
The LAMB optimizer has this option, but AdamW does not. This is necessary to train transformer models with Adam.
Relevant information
- Are you willing to contribute it (yes/no): no
- Are you willing to maintain it going forward? (yes/no): no
- Is there a relevant academic paper? (if so, where):
- Is there already an implementation in another framework? (if so, where): Yes, TF 1.
- Was it part of tf.contrib? (if so, where): no
Which API type would this fall under (layer, metric, optimizer, etc.)
Optimizer
Who will benefit with this feature?
User training NLP models with LayerNorm.