Add `rnd_ppo.py` documentation and refactor

`rnd_ppo.py` is a bit dated, and I recommend refactoring it to match other PPO style, which would include:

- [x] change the name from `rnd_ppo.py` to `ppo_rnd.py`
- [x] use `from gym.wrappers.normalize import RunningMeanStd` instead of the implementing ourselves (note the implementation might be a bit different).
- [x] create a `make_env` function like https://github.com/vwxyzjn/cleanrl/blob/0b3f8eae7d07b90a0ee129ffe290bd82e5b57a14/cleanrl/ppo_atari.py#L88-L103
- [x] remove the visualization (i.e., `ProbsVisualizationWrapper`)
- [x] use ` def get_value` and `def get_action_and_value` for the `Agent` class
- [x] remove https://github.com/vwxyzjn/cleanrl/blob/0b3f8eae7d07b90a0ee129ffe290bd82e5b57a14/cleanrl/rnd_ppo.py#L706-L708
- [x] maybe log the average `curiosity_reward` instead? https://github.com/vwxyzjn/cleanrl/blob/0b3f8eae7d07b90a0ee129ffe290bd82e5b57a14/cleanrl/rnd_ppo.py#L848
- [x] name `total_reward_per_env` to `curiosity_return` https://github.com/vwxyzjn/cleanrl/blob/0b3f8eae7d07b90a0ee129ffe290bd82e5b57a14/cleanrl/rnd_ppo.py#L854
- [x] Add SPS (steps per second) metric. 

Overall I suggest selecting `ppo_atari.py` and `rnd_ppo.py` and use `Compare Selected` on VSCode to see the file difference and minimize the file difference:

![image](https://user-images.githubusercontent.com/5555347/156015137-e0b6b85c-0f61-4e03-a1e5-c7ff7e044cb4.png)


## Types of changes

- [ ] Bug fix
- [ ] New feature
- [ ] New algorithm
- [x] Documentation

## Checklist:


- [x] I've read the [CONTRIBUTION](https://github.com/vwxyzjn/cleanrl/blob/master/CONTRIBUTING.md) guide (**required**).
- [x] I have ensured `pre-commit run --all-files` passes (**required**).
- [x] I have updated the documentation accordingly.
- [ ] I have updated the tests accordingly (if applicable).

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. 
- [x] I have contacted @vwxyzjn to obtain access to the [openrlbenchmark W&B team](https://wandb.ai/openrlbenchmark) (**required**).
- [ ] I have tracked applicable experiments in [openrlbenchmark/cleanrl](https://wandb.ai/openrlbenchmark/cleanrl) with `--capture-video` flag toggled on (**required**).
- [ ] I have updated the documentation and previewed the changes via `mkdocs serve`.
    - [x] I have explained note-worthy implementation details.
    - [ ] I have explained the logged metrics.
    - [x] I have added links to the original paper and related papers (if applicable).
    - [ ] I have added links to the PR related to the algorithm.
    - [ ] I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
    - [ ] I have added the learning curves (in PNG format with `width=500` and `height=300`).
    - [ ] I have added links to the tracked experiments.
- [ ] I have updated the tests accordingly (if applicable).

	def make_env(env_id, seed, idx, capture_video, run_name):
	def thunk():
	env = gym.make(env_id)
	env = gym.wrappers.RecordEpisodeStatistics(env)
	if capture_video:
	if idx == 0:
	env = gym.wrappers.RecordVideo(env, f"videos/{run_name}")
	env = NoopResetEnv(env, noop_max=30)
	env = MaxAndSkipEnv(env, skip=4)
	env = EpisodicLifeEnv(env)
	if "FIRE" in env.unwrapped.get_action_meanings():
	env = FireResetEnv(env)
	env = ClipRewardEnv(env)
	env = gym.wrappers.ResizeObservation(env, (84, 84))
	env = gym.wrappers.GrayScaleObservation(env)
	env = gym.wrappers.FrameStack(env, 4)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add `rnd_ppo.py` documentation and refactor #127

Types of changes

Checklist:

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

	class Flatten(nn.Module):
	def forward(self, input):
	return input.view(input.size(0), -1)

Add rnd_ppo.py documentation and refactor #127

Description

Types of changes

Checklist:

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Add `rnd_ppo.py` documentation and refactor #127