Skip to content

[feature-request] N-step returns for TD methods #47

@araffin

Description

@araffin

Originally posted by @partiallytyped in hill-a/stable-baselines#821
"
N-step returns allow for much better stability, and improve performance when training DQN, DDPG etc, so it will be quite useful to have this feature.

A simple implementation of this would be as a wrapper around ReplayBuffer so it would work with both Prioritized and Uniform sampling. The wrapper keeps a queue of observed experiences compute the returns and add the experience to the buffer.
"

Roadmap: v1.1+ (see #1 )

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions