-
Notifications
You must be signed in to change notification settings - Fork 829
Closed
Description
rnd_ppo.py
is a bit dated, and I recommend refactoring it to match other PPO style, which would include:
- change the name from
rnd_ppo.py
toppo_rnd.py
- use
from gym.wrappers.normalize import RunningMeanStd
instead of the implementing ourselves (note the implementation might be a bit different). - create a
make_env
function likeLines 88 to 103 in 0b3f8ea
def make_env(env_id, seed, idx, capture_video, run_name): def thunk(): env = gym.make(env_id) env = gym.wrappers.RecordEpisodeStatistics(env) if capture_video: if idx == 0: env = gym.wrappers.RecordVideo(env, f"videos/{run_name}") env = NoopResetEnv(env, noop_max=30) env = MaxAndSkipEnv(env, skip=4) env = EpisodicLifeEnv(env) if "FIRE" in env.unwrapped.get_action_meanings(): env = FireResetEnv(env) env = ClipRewardEnv(env) env = gym.wrappers.ResizeObservation(env, (84, 84)) env = gym.wrappers.GrayScaleObservation(env) env = gym.wrappers.FrameStack(env, 4) - remove the visualization (i.e.,
ProbsVisualizationWrapper
) - use
def get_value
anddef get_action_and_value
for theAgent
class - remove
Lines 706 to 708 in 0b3f8ea
class Flatten(nn.Module): def forward(self, input): return input.view(input.size(0), -1) - maybe log the average
curiosity_reward
instead?Line 848 in 0b3f8ea
f"global_step={global_step}, episodic_return={info['episode']['r']}, curiosity_reward={curiosity_rewards[step][idx]}" - name
total_reward_per_env
tocuriosity_return
Line 854 in 0b3f8ea
total_reward_per_env = np.array( - Add SPS (steps per second) metric.
Overall I suggest selecting ppo_atari.py
and rnd_ppo.py
and use Compare Selected
on VSCode to see the file difference and minimize the file difference:
Types of changes
- Bug fix
- New feature
- New algorithm
- Documentation
Checklist:
- I've read the CONTRIBUTION guide (required).
- I have ensured
pre-commit run --all-files
passes (required). - I have updated the documentation accordingly.
- I have updated the tests accordingly (if applicable).
If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments.
- I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team (required).
- I have tracked applicable experiments in openrlbenchmark/cleanrl with
--capture-video
flag toggled on (required). - I have updated the documentation and previewed the changes via
mkdocs serve
.- I have explained note-worthy implementation details.
- I have explained the logged metrics.
- I have added links to the original paper and related papers (if applicable).
- I have added links to the PR related to the algorithm.
- I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
- I have added the learning curves (in PNG format with
width=500
andheight=300
). - I have added links to the tracked experiments.
- I have updated the tests accordingly (if applicable).
Metadata
Metadata
Assignees
Labels
No labels