Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] New rendering API #2540

Closed
jkterry1 opened this issue Dec 22, 2021 · 11 comments
Closed

[Proposal] New rendering API #2540

jkterry1 opened this issue Dec 22, 2021 · 11 comments

Comments

@jkterry1
Copy link
Collaborator

jkterry1 commented Dec 22, 2021

As mentioned in #2524, the rendering API is the last breaking change I think should be made.

Right now, the rendering API has a few problems problems:

  • When using frame skipping or similar wrappers, calling .render() after each step does allow you to extract a smooth video
  • Many environments can't render after each step (or if they can, doing so is super hacky), but are able to easily return an output at the end. This the case for things like physics simulation environments, where it makes more sense to save state logs that can then be handed off to separate rendering logic at the end.
  • Relatedly, certain environments can only render after being initialized in a certain way, resulting in weird environment logic to make sure they're always renderable.
  • While this isn't an issue for Gym, in PettingZoo (which is based on Gym's API), its very unclear to users which agents step to call .render().

What I think would be a solution to all these problems that would also be more user friendly is the following:

  1. Make render an argument to init, defaulting to render=None, where people can specify if they want to render, and if so how, at initialization. This could also be put in reset instead of init, I'm not certain which makes more sense.
  2. Add a collection method to call after you're done acting in the environment, where if render should return something (e.g. a list of numpy frames or a video file as specified by the initialization argument), this is returned to you. We could call this .collect(), though I'm open to better names for the function.

We could also create a Gym wrapper, say gym.wrappers.legacy_render that makes calls to existing .render() methods and supports common render arguments (presumably live, video and frames), such that there old environments wouldn't even have to change their code to still be useable.

@vwxyzjn
Copy link
Contributor

vwxyzjn commented Dec 23, 2021

I agree for fast throughput envs we need a different rendering mechanism and thus some changes are warranted. Here are some thoughts.

When using frame skipping or similar wrappers, calling .render() after each step does allow you to extract a smooth video

Do you mean "does not"? This is not necessarily true. For example, the following can extract a smooth video.

env = gym.make(gym_id)
env = gym.wrappers.RecordEpisodeStatistics(env)
env = gym.wrappers.RecordVideo(env, f"videos/{run_name}")
env = NoopResetEnv(env, noop_max=30)
env = MaxAndSkipEnv(env, skip=4)
env = EpisodicLifeEnv(env)
if "FIRE" in env.unwrapped.get_action_meanings():
    env = FireResetEnv(env)
env = ClipRewardEnv(env)
env = gym.wrappers.ResizeObservation(env, (84, 84))
env = gym.wrappers.GrayScaleObservation(env)
env = gym.wrappers.FrameStack(env, 4)

Also, I think the existing APIs have a few benefits

  1. Many envs can produce RGB images quite efficiently, so current APIs might be good enough, so to speak.
  2. The existing APIs give a lot of flexibility over customizing visualization. You can easily override env.render('rgb_array') to create custom visualizations. For example, this code produces the following video in this experiment
openaigym.video.0.1.video007000_c1d7901b.2.mp4

Perhaps an alternative way to handle the fast throughput envs is to introduce a custom_render and a custom_render_schedule in the init function to help the fast throughput envs?

@JesseFarebro
Copy link
Contributor

@vwxyzjn there are still issues even if you appropriately stack the wrappers. As @jkterry1 pointed out, things like frameskip become a problem, especially with the human rendering mode. When stacking wrappers the final call to .render() will ultimately display what's returned after the frame-skip.

What I really want to see is more control over the human render mode. I really like the design of Gym3. They have a keyword argument to __init__ called render_mode. If render_mode == "human" then the environment can perform the necessary steps to give the user a good rendering experience, e.g., initialize SDL, enable audio, enable HiDPI, resolution scaling, etc. If render_mode == "rgb_array" then the info dict returned by step() (and reset() in the future) will have the rgb key containing the fame as would be returned by step(). The render modes are still exposed by using the class variable render_modes which can be set to an empty array by the Gym base class.

Using this method for rendering doesn't change much from a wrapper standpoint. You can still override rgb_array in step() or add your own custom render mode (e.g., for distributional RL as you mentioned above). It might even be more natural to override the rendering behaviour as you conditionally check that we've rendered an RGB array (by checking 'rgb' in info) whereas before there was no guarantee calling super().render() would do anything meaningful.

Furthermore, this method of rendering would align better with the vector environment API. Calling .render() on a vector environment is a little weird but with the render_mode kwarg you could easily return the rgb image from each environment.

@jkterry1
Copy link
Collaborator Author

After thinking about this more, I do think that it makes more sense to make this an argument to init to reset. @RedTachyon ?

@RedTachyon
Copy link
Contributor

I'm fine with it being an argument to init/reset, but we really need to put some more thought into what render is meant to actually do. I think I second @JesseFarebro's point here, that if render_mode is "human", then the video is meant to play live on screen, and if it's "rgb_array", it could be passed to the info dict.

One question would be how to actually make it happen in environments. Will we require every environment to copy-paste a similar boilerplate to handle that logic? Or should we connect some of these things in the base Env class, and have people call super()?

And one more doubt - I'm very much not a fan of using a string as an argument for that, can we just make it into an enum? This would make it a tad more annoying to use (i.e. you have to import the enum), but so much more readable and understandable, if the intention is to support only these two (or any other predefined) rendering modes.

Perhaps two separate settings? One boolean for "render_human" and one for "render_rgb"? (maybe with better names)

@jkterry1
Copy link
Collaborator Author

I thought about this more. I'd like to make render an argument to init, that accepts strings. When an environment steps in rendered mode, it does literally whatever it wants to, which can handle the huge diversity of things. The environment can then have .collect_render() called to have it return whatever it wants to return if it should return things.

The only boiler plate code that I'm aware of is for caching and returning either stacks of numpy arrays, gifs, or videos. The way to get around boiler plate code would be to provide a util where, if the environment gives it the arrays it does everything else. This would basically be identical to a compatibility wrapper for the existing render function.

@jkterry1
Copy link
Collaborator Author

@RedTachyon @JesseFarebro does that work for you guys?

@jkterry1
Copy link
Collaborator Author

This method would allow the environments to still to popups or literally whatever else they could want to, it entirely hands it off to the environment, which I think is the way.

@RedTachyon
Copy link
Contributor

Why would the init argument take strings? Afaik the rendering modes that really appeared anywhere are:

  • Display it on screen
  • Return an RGB array
  • Return something else (e.g. a string for an ASCII art representation)

The "return X" options will anyways require a render() method, however it'd be named. The only thing that could be handled through the init argument would be displaying on screen, so a boolean there would suffice. Though worth keeping in mind that not all environments necessarily support that by default.

Another question is - what if the environment needs some additional setup to do RGB array rendering? Should this setup only be performed if we set the init parameter? Overall I'd definitely focus on having more convenience/simplicity for this use case as I'm guessing it's more frequently used (i.e. that's the method you use for actually recording stuff)

@jkterry1
Copy link
Collaborator Author

We discussed this on call, but in essence the proposal is for the init argument to cause literally whatever the environment wants, with a few generally standard strings, and a standard method for collecting returns from render. This is about as honest to the heterogeneity of these environments as possible

@younik younik mentioned this issue Mar 12, 2022
1 task
@norabelrose
Copy link

Earlier today I made some proposals for modifying the new rendering API to make it more flexible on #2671. I'm re-posting them here so that we can keep the discussion on #2671 focused on @younik's specific PR.

Here's the basic proposal:

  • The collect_render() method should actually pop the rendered timesteps off the stack, allowing for multiple separate trajectories to be rendered using the same env object. It seems like this should be possible for any environment.
  • It should be possible to toggle rendering on an env without creating/destroying the env. I'm thinking that this could be conceptually similar to requires_grad on tensors in PyTorch. We could have both a requires_render property, along with convenience context managers to flip this property on or off:
env = gym.make("Cartpole-v1", render=True)
# Take some steps in the env in rendering mode
frames = env.collect_render()

with env.no_render():
    # Take some steps without any rendering overhead

# Turn off rendering without a context manager
env.requires_render = False

# Turn rendering back on contextually
with env.enable_render():
    # Take some steps with rendering enabled

It was also suggested by @younik that with this approach, it may not even be necessary to include the render=True flag in __init__ anymore. I think it would be ideal if we could eschew the render flag at initialization, but it's not clear how we could handle the cases that @jkterry1 mentioned earlier in this issue, where

...certain environments can only render after being initialized in a certain way, resulting in weird environment logic to make sure they're always renderable.

If we want to support rendering in these environments then it might be necessary to keep the render flag at initialization. The only alternative I can think of would be to do some kind of lazy loading for these environments, initializing in reset() or something, after the user has had a chance to indicate if they want to render. But maybe that wouldn't be such a big deal? It would be helpful to see a couple examples of these environments to know how necessary the render flag in __init__ would be. But regardless, I think this is sort of a side issue— it's more important IMO that we allow people to render specified segments of trajectories without re-initializing the env.

Thanks in advance for considering my input and I'm very willing to contribute code to making sure fine-grained rendering is possible in Gym 1.0.

@jkterry1
Copy link
Collaborator Author

Closing in favor of #2671

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants