-
Notifications
You must be signed in to change notification settings - Fork 825
Closed
Description
Problem description
Per Andrychowicz, et al. (2021) and anecdotal evidence, value function clipping is not useful. Hence we should remove the following code.
Lines 283 to 291 in 94a685d
v_loss_unclipped = (newvalue - b_returns[mb_inds]) ** 2 | |
v_clipped = b_values[mb_inds] + torch.clamp( | |
newvalue - b_values[mb_inds], | |
-args.clip_coef, | |
args.clip_coef, | |
) | |
v_loss_clipped = (v_clipped - b_returns[mb_inds]) ** 2 | |
v_loss_max = torch.max(v_loss_unclipped, v_loss_clipped) | |
v_loss = 0.5 * v_loss_max.mean() |
We should do it with great care - conducting benchmark experiments confirming this removal results in the same or better performance in the games we test. That is, we should re-run the following and confirms the performance is ok.
Lines 1 to 59 in 94a685d
# export WANDB_ENTITY=openrlbenchmark | |
poetry install | |
OMP_NUM_THREADS=1 xvfb-run -a python -m cleanrl_utils.benchmark \ | |
--env-ids CartPole-v1 Acrobot-v1 MountainCar-v0 \ | |
--command "poetry run python cleanrl/ppo.py --cuda False --track --capture-video" \ | |
--num-seeds 3 \ | |
--workers 9 | |
poetry install -E atari | |
OMP_NUM_THREADS=1 xvfb-run -a python -m cleanrl_utils.benchmark \ | |
--env-ids PongNoFrameskip-v4 BeamRiderNoFrameskip-v4 BreakoutNoFrameskip-v4 \ | |
--command "poetry run python cleanrl/ppo_atari.py --track --capture-video" \ | |
--num-seeds 3 \ | |
--workers 3 | |
poetry install -E atari | |
OMP_NUM_THREADS=1 xvfb-run -a python -m cleanrl_utils.benchmark \ | |
--env-ids PongNoFrameskip-v4 BeamRiderNoFrameskip-v4 BreakoutNoFrameskip-v4 \ | |
--command "poetry run python cleanrl/ppo_atari_lstm.py --track --capture-video" \ | |
--num-seeds 3 \ | |
--workers 3 | |
poetry install -E envpool | |
xvfb-run -a python -m cleanrl_utils.benchmark \ | |
--env-ids Pong-v5 BeamRider-v5 Breakout-v5 \ | |
--command "poetry run python cleanrl/ppo_atari_envpool.py --track --capture-video" \ | |
--num-seeds 3 \ | |
--workers 1 | |
poetry install -E "mujoco pybullet" | |
python -c "import mujoco_py" | |
OMP_NUM_THREADS=1 xvfb-run -a python -m cleanrl_utils.benchmark \ | |
--env-ids HalfCheetah-v2 Walker2d-v2 Hopper-v2 \ | |
--command "poetry run python cleanrl/ppo_continuous_action.py --cuda False --track --capture-video" \ | |
--num-seeds 3 \ | |
--workers 9 | |
poetry install -E procgen | |
xvfb-run -a python -m cleanrl_utils.benchmark \ | |
--env-ids starpilot bossfight bigfish \ | |
--command "poetry run python cleanrl/ppo_procgen.py --track --capture-video" \ | |
--num-seeds 3 \ | |
--workers 1 | |
poetry install -E atari | |
xvfb-run -a python -m cleanrl_utils.benchmark \ | |
--env-ids PongNoFrameskip-v4 BeamRiderNoFrameskip-v4 BreakoutNoFrameskip-v4 \ | |
--command "poetry run torchrun --standalone --nnodes=1 --nproc_per_node=2 cleanrl/ppo_atari_multigpu.py --track --capture-video" \ | |
--num-seeds 3 \ | |
--workers 1 | |
poetry install -E "pettingzoo atari" | |
poetry run AutoROM --accept-license | |
xvfb-run -a python -m cleanrl_utils.benchmark \ | |
--env-ids pong_v3 surround_v2 tennis_v3 \ | |
--command "poetry run python cleanrl/ppo_pettingzoo_ma_atari.py --track --capture-video" \ | |
--num-seeds 3 \ | |
--workers 3 |
Metadata
Metadata
Assignees
Labels
No labels