PPO improvements

## Problem Description

The current PPO implementations can be improved in the following way.

### changes that do not involve performance change
- [x] #207
- [x] #203
- [x] #167

### changes that do involve performance change (require re-running openrlbenchmark)
- [ ]  #198
- [x] #208