Skip to content

Add rnd_ppo.py documentation and refactor #127

@vwxyzjn

Description

@vwxyzjn

rnd_ppo.py is a bit dated, and I recommend refactoring it to match other PPO style, which would include:

  • change the name from rnd_ppo.py to ppo_rnd.py
  • use from gym.wrappers.normalize import RunningMeanStd instead of the implementing ourselves (note the implementation might be a bit different).
  • create a make_env function like
    def make_env(env_id, seed, idx, capture_video, run_name):
    def thunk():
    env = gym.make(env_id)
    env = gym.wrappers.RecordEpisodeStatistics(env)
    if capture_video:
    if idx == 0:
    env = gym.wrappers.RecordVideo(env, f"videos/{run_name}")
    env = NoopResetEnv(env, noop_max=30)
    env = MaxAndSkipEnv(env, skip=4)
    env = EpisodicLifeEnv(env)
    if "FIRE" in env.unwrapped.get_action_meanings():
    env = FireResetEnv(env)
    env = ClipRewardEnv(env)
    env = gym.wrappers.ResizeObservation(env, (84, 84))
    env = gym.wrappers.GrayScaleObservation(env)
    env = gym.wrappers.FrameStack(env, 4)
  • remove the visualization (i.e., ProbsVisualizationWrapper)
  • use def get_value and def get_action_and_value for the Agent class
  • remove

    cleanrl/cleanrl/rnd_ppo.py

    Lines 706 to 708 in 0b3f8ea

    class Flatten(nn.Module):
    def forward(self, input):
    return input.view(input.size(0), -1)
  • maybe log the average curiosity_reward instead?
    f"global_step={global_step}, episodic_return={info['episode']['r']}, curiosity_reward={curiosity_rewards[step][idx]}"
  • name total_reward_per_env to curiosity_return
    total_reward_per_env = np.array(
  • Add SPS (steps per second) metric.

Overall I suggest selecting ppo_atari.py and rnd_ppo.py and use Compare Selected on VSCode to see the file difference and minimize the file difference:

image

Types of changes

  • Bug fix
  • New feature
  • New algorithm
  • Documentation

Checklist:

  • I've read the CONTRIBUTION guide (required).
  • I have ensured pre-commit run --all-files passes (required).
  • I have updated the documentation accordingly.
  • I have updated the tests accordingly (if applicable).

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments.

  • I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team (required).
  • I have tracked applicable experiments in openrlbenchmark/cleanrl with --capture-video flag toggled on (required).
  • I have updated the documentation and previewed the changes via mkdocs serve.
    • I have explained note-worthy implementation details.
    • I have explained the logged metrics.
    • I have added links to the original paper and related papers (if applicable).
    • I have added links to the PR related to the algorithm.
    • I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
    • I have added the learning curves (in PNG format with width=500 and height=300).
    • I have added links to the tracked experiments.
  • I have updated the tests accordingly (if applicable).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions