Skip to content

Remove the value function clipping #208

@vwxyzjn

Description

@vwxyzjn

Problem description

Per Andrychowicz, et al. (2021) and anecdotal evidence, value function clipping is not useful. Hence we should remove the following code.

cleanrl/cleanrl/ppo.py

Lines 283 to 291 in 94a685d

v_loss_unclipped = (newvalue - b_returns[mb_inds]) ** 2
v_clipped = b_values[mb_inds] + torch.clamp(
newvalue - b_values[mb_inds],
-args.clip_coef,
args.clip_coef,
)
v_loss_clipped = (v_clipped - b_returns[mb_inds]) ** 2
v_loss_max = torch.max(v_loss_unclipped, v_loss_clipped)
v_loss = 0.5 * v_loss_max.mean()

We should do it with great care - conducting benchmark experiments confirming this removal results in the same or better performance in the games we test. That is, we should re-run the following and confirms the performance is ok.

# export WANDB_ENTITY=openrlbenchmark
poetry install
OMP_NUM_THREADS=1 xvfb-run -a python -m cleanrl_utils.benchmark \
--env-ids CartPole-v1 Acrobot-v1 MountainCar-v0 \
--command "poetry run python cleanrl/ppo.py --cuda False --track --capture-video" \
--num-seeds 3 \
--workers 9
poetry install -E atari
OMP_NUM_THREADS=1 xvfb-run -a python -m cleanrl_utils.benchmark \
--env-ids PongNoFrameskip-v4 BeamRiderNoFrameskip-v4 BreakoutNoFrameskip-v4 \
--command "poetry run python cleanrl/ppo_atari.py --track --capture-video" \
--num-seeds 3 \
--workers 3
poetry install -E atari
OMP_NUM_THREADS=1 xvfb-run -a python -m cleanrl_utils.benchmark \
--env-ids PongNoFrameskip-v4 BeamRiderNoFrameskip-v4 BreakoutNoFrameskip-v4 \
--command "poetry run python cleanrl/ppo_atari_lstm.py --track --capture-video" \
--num-seeds 3 \
--workers 3
poetry install -E envpool
xvfb-run -a python -m cleanrl_utils.benchmark \
--env-ids Pong-v5 BeamRider-v5 Breakout-v5 \
--command "poetry run python cleanrl/ppo_atari_envpool.py --track --capture-video" \
--num-seeds 3 \
--workers 1
poetry install -E "mujoco pybullet"
python -c "import mujoco_py"
OMP_NUM_THREADS=1 xvfb-run -a python -m cleanrl_utils.benchmark \
--env-ids HalfCheetah-v2 Walker2d-v2 Hopper-v2 \
--command "poetry run python cleanrl/ppo_continuous_action.py --cuda False --track --capture-video" \
--num-seeds 3 \
--workers 9
poetry install -E procgen
xvfb-run -a python -m cleanrl_utils.benchmark \
--env-ids starpilot bossfight bigfish \
--command "poetry run python cleanrl/ppo_procgen.py --track --capture-video" \
--num-seeds 3 \
--workers 1
poetry install -E atari
xvfb-run -a python -m cleanrl_utils.benchmark \
--env-ids PongNoFrameskip-v4 BeamRiderNoFrameskip-v4 BreakoutNoFrameskip-v4 \
--command "poetry run torchrun --standalone --nnodes=1 --nproc_per_node=2 cleanrl/ppo_atari_multigpu.py --track --capture-video" \
--num-seeds 3 \
--workers 1
poetry install -E "pettingzoo atari"
poetry run AutoROM --accept-license
xvfb-run -a python -m cleanrl_utils.benchmark \
--env-ids pong_v3 surround_v2 tennis_v3 \
--command "poetry run python cleanrl/ppo_pettingzoo_ma_atari.py --track --capture-video" \
--num-seeds 3 \
--workers 3

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions