Learning rate decay

I have been thinking about robust model building on Important Outliers - outliers that matters so we don't want the model to add predictions closer to the average value. That led me the idea that implementing a learning rate decay would add the model more flexibility as in the beginning it could catch the trivial rules (a trivial case decision tree in the beginning) due to high learning rate and later in the training it could fine tune the result. 

I have made my own GBTree with this additional feature and found that the model learnt much faster as well! Not a big coding but great improvement in performance. I have not tested the robustness (e.g. overfitting) of the model on many datasets but the one I used on showed promising results - no overfit.

I used learning_rate_start=0.5, learning_rate_min=0.01 and lr_decay=0.95. 
First iteration:
`lr = learning_rate_start`
At each iteration afterwards the following rule applies:
`lr = max(learning_rate_min, lr*lr_decay)`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Learning rate decay #4955

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Learning rate decay #4955

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions