-
Notifications
You must be signed in to change notification settings - Fork 824
Closed
Description
Problem Description
The current PPO implementations can be improved in the following way.
changes that do not involve performance change
- Removing the regular advantage calculation in PPO #207
- PPO reward normalization works only for default gamma #203
- Various minor PPO refactors #167
changes that do involve performance change (require re-running openrlbenchmark)
Metadata
Metadata
Assignees
Labels
No labels