You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Originally posted by @partiallytyped in hill-a/stable-baselines#821
"
N-step returns allow for much better stability, and improve performance when training DQN, DDPG etc, so it will be quite useful to have this feature.
A simple implementation of this would be as a wrapper around ReplayBuffer so it would work with both Prioritized and Uniform sampling. The wrapper keeps a queue of observed experiences compute the returns and add the experience to the buffer.
"