Skip to content

Conversation

ryanrudes
Copy link
Contributor

Description

This adds text spaces to Gym.

Type of change

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Checklist:

  • I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@pseudo-rnd-thoughts
Copy link
Contributor

pseudo-rnd-thoughts commented Jun 19, 2022

Thanks for your PR, are there any particular environments that we wish to use this space with?
I haven't done any text-based ML in a long time, but could the discrete space not be used as a link to the embedding either on observations or actions?
You have a couple of todos in your Text class, what is your planning in finishing them?

I can tell before you run the CI that the pre-commit section will fail.
Could add a commit for the pre-commit, read here for more details

@RedTachyon
Copy link
Contributor

The idea is kinda interesting, but this feels like something that should start out with a discussion that would guide the implementation, instead of dropping a PR and then we have to simultaneously judge the idea and the implementation.

So before even approaching a code review, can you do a proper pitch of this idea? With the motivation, example uses, and the proposed structure.

@ryanrudes
Copy link
Contributor Author

ryanrudes commented Jun 20, 2022

Sure, @RedTachyon.

I was hoping the Text space to be useful for environments with textual prompts guiding the agent. Take for instance an agent that uses an autoregressive language model to interpret prompts such as "pick up the red block", "rotate wheel clockwise 90 degrees", etc. This would allow for gym to support research at the intersection between RL and NLP/U.

In use, one thing that concerns me is the sample() method, particularly how one might want the space to sample random textual prompts in a certain situation. The implementation I provided does not contain this, but it is mentioned as a TODO, that is, providing the ability to instantiate the class with a regular expression to allow for coherent sampling. Let's say we have an environment featuring a robot arm that will be directed to identify and grasp a particular block on a countertop based on its color. Prompts will be in the format of "lift the red block". But perhaps we might want to promote a bit of generalization and vary the target block color in addition to the language used in the instruction. Instead of "lift", we might want to select a verb randomly from "pick", "raise", "grasp", etc. And we may want to vary the color of the target block between "orange", "yellow", "blue", etc. We might consider building something to allow something similar to the below example:

from gym.spaces import Text, Box
from gym import Env

import numpy as np
import random

class PickAndLift(Env):
    VERBS = ["lift", "pick", "raise", "grasp"]
    COLORS = ["red", "orange", "yellow", "green", "blue", "purple", "pink", "white", "black"]

    def __init__(self):
        self.observation_space = Text(f"{verb} the {color} block")
        self.action_space = Box(low = -np.inf, high = np.inf, shape = (7,), dtype = np.float32)

    def reset(self) -> str:
        verb = random.choice(self.VERBS)
        color = random.choice(self.COLORS)
        return f"{verb} the {color} block".format(verb = verb, color = color)

    ...

I hope this provides some clarity in terms of the intentions/motivations for adding this text modality to gym.spaces

@hannahrawr
Copy link
Contributor

hannahrawr commented Jun 20, 2022

I think the motivation behind this is quite sound, though I think there can be some improvements before this gets merged

  1. regex should definitely be integrated into the space, with at least just a default match-all supported ((?s).* should suffice). Though, one concern is that this might be inconsistent with the charset if we decide to keep both attributes.
    • particularly, it is not clear to me how we might keep both regex and charset and ensure that they are mutually consistent.
    • regex should be able to replace most of the use of charset or _charlist, but sampling might be difficult. There are external libraries on how to perform random sampling of strings from regex (https://github.com/asciimoo/exrex), but it would be a fairly big decision to add this just to support this space (not to mention the differing license).
  2. Maybe a default toy environment could be implemented, so that there is an example to provide intuition to others on how to use it. The PickAndLift example shown looks well thought out enough to expand on.

Copy link
Contributor

@hannahrawr hannahrawr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional minor comments on the code itself


def __init__(
self,
max_length: int,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be best if max_length could be -1, and that would then bypass the check. This would help if the strings themselves could be arbitrarily long (e.g. if we're passing paragraphs of text for whatever reason), so users won't need to specify an arbitrary upper bound

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't imagine a situation in which a user would genuinely want the sampler to output random combinations of characters between certain length bounds. So, if we constrain sampling to a regular expression, are bounds on the length even necessary at all?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could remove max_length and leave it to the regex, but if the regex itself does not have a defined length bound, it could match an infinite pattern, and the length of the sampled string will depend on whichever internal limit the sample algorithm might have.

@ryanrudes
Copy link
Contributor Author

@andrewtanJS I'm thinking a card game might be a decent choice for a toy environment that could use the Text space.

@pseudo-rnd-thoughts
Copy link
Contributor

@andrewtanJS @ryanrudes @RedTachyon What changes still need to be made to this PR for it to be reviewed and merged?

@ryanrudes
Copy link
Contributor Author

@pseudo-rnd-thoughts For one, we need to devise and implement a simple toy task that uses the text space. I don't have the time to write this at the moment, perhaps someone else is up to it?

@hannahrawr
Copy link
Contributor

@pseudo-rnd-thoughts it'll need the environment example, as well as the regex implementation if we go with it. After some thought, I wonder if this might just add unnecessary complexity or assumptions to the space, and we could just use the existing charset and sample() with the assumption that any environment implementation will flesh it out if they really need to.

@ryanrudes what are your thoughts on this? Would it be easy to support sampling if we add regex? And ultimately, how would the text space be utilized so that the implemented attributes best support the use cases?

@ryanrudes
Copy link
Contributor Author

@pseudo-rnd-thoughts it'll need the environment example, as well as the regex implementation if we go with it. After some thought, I wonder if this might just add unnecessary complexity or assumptions to the space, and we could just use the existing charset and sample() with the assumption that any environment implementation will flesh it out if they really need to.

@ryanrudes what are your thoughts on this? Would it be easy to support sampling if we add regex? And ultimately, how would the text space be utilized so that the implemented attributes best support the use cases?

I agree that adding the regex sampling would certainly overcomplicate things, and possibly for little to no good reason anyways... if we were to aim for an all-encompassing sampling method that would be appropriate for most environments, I couldn't imagine a way to do that without further complicating things. If possible, perhaps the sample() method should be more just for consistency and less for functionality, since I can't really think of a situation where it would be used besides to visually inspect an example output of the space. As you said, anyone can simply implement their own sampling in their environment. However, there should be some critical initializers to allow for baseline agents to be constructed purely from the metadata of this space. I wonder if a contributor to stable-baselines could provide some advice on this?

@pseudo-rnd-thoughts
Copy link
Contributor

The problem that I see with this space, is for almost all games then a MultiDiscrete space can be used instead of the Text, except in the case of emergent communication where two or more agents "learn" a novel language to play a game. I have a PhD college who would be interested in using this space.
However, these are generally multi-agent problems not single agent problems thus we wouldn't want to add them to Gym.
If @ryanrudes fixed all of the CI and pre-commit issues and @andrewtanJS believed that the general structure is effective then I would be happy to merge the PR.
Allowing researchers to use the space and make feature requests as necessary for the space. Thoughts?

@hannahrawr
Copy link
Contributor

@pseudo-rnd-thoughts that sounds like a good direction to take things. This space should sufficient for general use, and if there's more feature requests that arise we can continue to iterate on it further.

@pseudo-rnd-thoughts pseudo-rnd-thoughts mentioned this pull request Jul 11, 2022
@jkterry1 jkterry1 closed this Jul 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants