[feature-request] N-step returns for TD methods

Originally posted by @PartiallyTyped in https://github.com/hill-a/stable-baselines/issues/821
"
N-step returns allow for much better stability, and improve performance when training DQN, DDPG etc, so it will be quite useful to have this feature.

A simple implementation of this would be as a wrapper around ReplayBuffer so it would work with both Prioritized and Uniform sampling. The wrapper keeps a queue of observed experiences compute the returns and add the experience to the buffer.
"

Roadmap: v1.1+ (see #1 )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[feature-request] N-step returns for TD methods #47

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[feature-request] N-step returns for TD methods #47

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions