-
Notifications
You must be signed in to change notification settings - Fork 8.7k
Description
Motivation
The current vector env API would return an info
dict per environment. So if if I have 5 envs, then the dict could look like
[{}, {}, {}, {}, {"truncated": True}]
While this API is intuitive, it's not a friendly interface to high-throughput environments such as brax (also see google/brax#164) and envpool. In these envs, they prefer the following paradigm: instead of 5 dictionaries, it's just a single dictionary with an array of length 5.
{"truncated":[False, False, False, False, True]}
Pros and Cons
Brax and envpool's approach feels more ergonomic for high-throughput envs, whereas Gym's approach is maybe more efficient for "sparse" info keys that don't appear often (I could be wrong with this).
Proposal
I propose to change the info
API for vectorized environments to adapt brax's current design for info
. This might also have implications for wrappers and other APIs. For example, the current vectorized environment API would return info
like this (with the RecordEpisodeStatistics
used):
[
{
"episode": {
"r": 75.0,
"l": 75,
"t": 36.770559,
},
"terminal_observation": array([0.2097086, -0.5355723, -0.21343598, -0.11173592], dtype=float32),
},
{},
{},
{},
{},
]
I am not sure what's the best way to handle this... Maybe something like the following?
{
"episode": {
"r": [75.0, None, None, None, None],
"l": [75, None, None, None, None],
"t": [36.770559, None, None, None, None],
},
"terminal_observation": [
array([0.2097086, -0.5355723, -0.21343598, -0.11173592], dtype=float32),
None,
None,
None,
None,
]
}
Alternative 1
Alternatively, it's possible to do this in a very hacky way: brax or env pool could always override the first sub-env's info dict to always provide their desired info so something like:
[{"truncated":[False, False, False, False, True]}, {}, {}, {}, {}]
This way, we don't have to change the current API, however, it's extremely hacky and unintuitive.
Alternative 2
Alternatively, maybe info
should be reserved for something sparse like episode
and terminal_observation
. And for more frequent infos they should be accessed via attributes (i.e., envs.get_attr(name)
(enabled with #1600). So something like
root_joints = brax_envs.get_attr("root_joints")
# instead of
# root_joints = info["root_joints"]
ale_lives = envpool_envs.get_attr("ale_lives")
# instead of
# ale_lives = info["ale_lives"]
What are your thoughts @RedTachyon @JesseFarebro, @tristandeleu @XuehaiPan @Trinkle23897 @erikfrey @araffin @Miffyli
Checklist
- I have checked that there is no similar issue in the repo (required)