Implement Truncated Quantile Critics (TQC)

I'm normally against implementing very recent papers before they prove to be valuable but I would like to make an exception for that one, especially because of the good results. It was recently accepted at ICML 2020.

Paper: [Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics](https://arxiv.org/abs/2005.04269)
Code: https://github.com/bayesgroup/tqc_pytorch

### Background

This paper build on SAC, TD3 and [QR-DQN](https://arxiv.org/abs/1710.10044), making use of quantile regression to predict a distribution for the value function (instead of a mean value).
It truncates the quantiles predicted by different networks (a bit as it is done in TD3).
This is for continuous actions only.

### Pros
I already implemented it in SB3 (https://github.com/DLR-RM/stable-baselines3/tree/feat/tqc), it was pretty straightforward as I'm using SAC code for the backbone (I did not remove the duplicated code yet) and the authors code for the loss. The difference between SAC and TQC is 30 lines (15 for the loss and 15 for the critic code).
And using SAC hyperparameters from the zoo, I could achieve very good results on Pybullet env and on BipedalWalkerHardcore (for this env it reaches maximal performance 10x faster than my previous experiments).
The good news is that SAC hyperparameters are transferable to this new algorithm.

The loss function can be re-used to implemented QR-DQN which is apparently a huge improvement over DQN (with minimal effort).
The author code is only both in Tensorflow and Pytorch and the results are really good.

### Cons
it adds a bit of complexity / duplication but this can be mitigated if it derives from SAC class.


My question is: should we integrate it for v1.0 (#1)  or should we wait?

@hill-a @Miffyli @AdamGleave @erniejunior 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement Truncated Quantile Critics (TQC) #83

Background

Pros

Cons

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement Truncated Quantile Critics (TQC) #83

Description

Background

Pros

Cons

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions