-
Notifications
You must be signed in to change notification settings - Fork 306
Description
To use brax with openai gym, we are open to modify the info style that is used for our vector envrionments.
There are three options for info styles (that Im aware of), each with their advantages and disadvantages
1 .Classic (this is currently implemeneted in gym)
[
env_0_info,
env_1_info,
...
]
- Brax (this is the style in brax)
{
key: np.array([env_0_info[key], env_1_info[key], ...]),
....
}
- Paired (this is a novel style that has the advantages of both styles, imo)
{
key: {
1: env_0_info[key],
2: env_1_info[key],
...
},
...
}
Example infos
env_0_info = {'reward': 1}
env_1_info = {'reward': 3}
env_2_info = {'truncation': True}
classic_info = [{'reward': 1}, {'reward': 3}, {'truncation': True}]
brax_info = {
'reward': np.array([1, 3, 0]),
'truncation': np.array([False, False, True])
}
paired_info = {
'reward': {0: 1, 1: 3},
'truncation': {2: True}
}
One of the disadvantages of Brax info style is the necessity of "No data" data (i.e. 0 for ints, None for objects, False for bool)
The problem is that it could be quite reasonable for environment to return 0, i.e. a wrapper that computes the total reward over an episode, the number of times an event occurs.
A way to avoid this could be to use objects if one of these "No data" actually exists in the info but this seems hacky to be changing the dtype at runtime. Another thing could be to raise a warning to users to say that a "no data" has been encountered.
The advantage of the Brax info style is the speed of creating and checking if info for an environment exists through just indexing the environment number.
One of the advantages of the paired info style is that the "no data" problem outlined above is not an issue as the actual data is paired with the environments and any environment without data are not included in the dictionary. A second advantage is to finding all environments with particular data is info[key].keys()
. The disadvantage of this style is to check if an environment has data is to check if the environment key exists then index the environment key. I haven't tested the performance impact of using dictionaries instead of numpy arrays so this maybe important to consider.
Is there any additional reasoning for brax to select option 2 over option 3?