Vector API - design considerations #32

RedTachyon · 2022-10-01T16:03:35Z

I started trying to implement a new, more sane vector API. And then I quickly realized that it is, indeed, as messy as I could have expected, so the code will have to wait for some time.

Here I want to dump my thoughts about how this whole thing should/could look so that we have a discussion going.

Main desired outcome: we can use the vector API to easily create envs vectorized through either simple vectorization, or jax vmapping (or any other fast mechanism). This can give us huge performance improvements for some envs without relying on additional external libraries. For other envs, we default to Sync/Async/EnvPool?

Current situation: vectorization is only possible via Sync/Async, which is slow af, but very general. EnvPool (not officially supported) only works with some envs, but is faster. Other existing options are generally similar to Sync/Async, with their own quirks (e.g. ray in rllib, or the custom implementation in SB3)

The main complication is wrappers. If an environment provides its own optimized vectorized version, then we can't apply single-env wrappers to it. A nice solution would be an automatic conversion from a Wrapper to a VectorWrapper, but that seems either very tricky or impossible to do in a general case. Fortunately, many actual wrappers don't need that "general case" treatment.

The hope I see for this is switching to lambda wrappers, at least for some of the existing wrappers. ActionWrappers, ObservationWrappers and RewardWrappers can in principle be stateful, which requires some workarounds to map them over vectorized envs. With lambda wrappers, we can literally just do a map.

An element that I think will be crucial is different levels of optimization - existing third-party environments and wrappers should work exactly the same way, with the clunky subprocess vecenv approach, unless they do a few extra things to opt-in for the improvements.

Another rough edge might be autoreset. Currently this concept is barely present in gym, it's an optional wrapper for single envs, and in that scope it works fine. In a vectorized case, it's more important and a bit more complicated. If we don't have some sort of autoreset by default in vector envs, that makes them borderline useless for many envs (consider cartpole where the first env instance happens to take 10 steps, and the second takes 100 steps - if we only reset after both are terminated, we just lost 45% of the data)

While a vectorized autoreset is trivial with a subprocess-like vector env, that's not the case with e.g. numpy/jax acceleration. While I can see some hacks that maybe would kinda work to add it in some of these cases via wrapper, we might just have to add a requirement that the environment handles autoreset itself. Note that this wouldn't be a breaking change in env design - envs that don't have built-in autoreset can still use the standard vectorization. But if you want to use vectorized wrappers and the more efficient vectorization paradigm, you need to add it.

Finally, a question is - how much can we break? I'm not aware of any significant usage of gym.vector, though I know it is used at least sometimes. Ideally I'd like to keep the outside API as similar as possible, perhaps even exactly the same (with additional capabilities). But can we change some of the internal semantics that are in principle exposed to the public, but are also just one of the few remaining relics of the past? As I recall, we want to do the vector revamp before 1.0, which is good, because after 1.0 we have to be very careful about breaking stuff.

Below I'm including a braindump of my semi-structured thoughts on this, just to have it recorded here with some additional details (most of this was mentioned above):

Each environment can implement its own VectorEnv, or use built-in Sync/Async
1. If implements its own, we can’t use individual wrappers - there’s no instance of gym.Env we can actually apply them to
2. If uses built-in, then the VectorEnv contains several instances of gym.Env, to which individual wrappers are applied
Each (?) wrapper should have single and vector mode - need to convert single to multi
1. Should be trivial for:
  1. Observation wrapper - map observation
  2. Reward wrapper - map reward
  3. Action wrapper - map action
  4. EDIT - it's actually not trivial, needs lambda wrappers
2. Need to have selectable/automatic optimization
  1. Jax envs/wrappers → vmap
  2. Pure numpy → nothing? or np vfunc?
  3. Generic → np.array(map) or np.array([... for o in obs])
  4. Settable in the wrapper? self.optimization: Literal["numpy"] | Literal["jax"] | None
3. Some wrappers can’t be vectorized
  1. Atari preprocessing - needs to reset envs asynchronously
  2. Autoreset in general?
    1. We can require optimized envs to autoreset internally. Third-party envs will default to the regular vectorization, and they can opt-in for this

Issues in the meantime:

OrderEnforcing (and others?) accept arguments in render
Atari wrapper
Several typing errors in vector API

Questions:

Can we break the whole vector API? Does anyone use it?
1. SB3 and rllib def have their own
(check myself) do we want vector API before 1.0?

The text was updated successfully, but these errors were encountered:

pseudo-rnd-thoughts · 2022-10-03T10:07:54Z

An additional change would be when reset is called.
Currently, the Gym vector API implements it such that if a step is terminated or truncated, reset is called within the step. The original terminating step observation is added to the info as "final_obs".
A proposed solution is noted in this comment openai/gym#2279 (comment) which would be necessary if we wanted EnvPool to work alongside the Gymnasium vector API

MarcCote · 2022-11-03T13:01:21Z

Hi, would it make sense for VectorEnv.reset(...) to accept a list of options similar to seeds, i.e. one options dict per env?

sven1977 · 2022-12-19T16:47:35Z

Thanks for your thoughts on this @MarcCote ! Sven here from the RLlib team :)

Some update from RLlib wrt vectorization: We are actively looking into fully deprecating all of RLlib's in-house env APIs (VectorEnv, BaseEnv, etc.., except for MultiAgentEnv), in favor of just supporting gymnasium (and its own vector option).
We are currently a few days away from merging our gymnasium support PR.

On the autoreset issue you mentioned: I would prefer autoreset (similar to how deepmind has always done it), but then the per-episode seeding (and option'ing) would not be accessible anymore. If per-episode seeding and option'ing is really essential for users, then we won't be able to avoid having an additional reset_at(vector_idx: int, *, seed=None, options=None) method (like we will have in RLlib after moving to gymnasium in a few days).

pseudo-rnd-thoughts · 2022-12-20T18:49:06Z

Thanks for your thoughts @sven1977

Im unfamiliar with how deepmind has achieved environment vectorisation, could you provide a short summary of the difference between deepminds approach and openai's approach (or envpool)

For being able to set the autoreset seed, this is an interesting idea that I hadn't heard of before. I don't think we will include this in the VectorEnv definition as we are interested in users being able to create custom vector environments which has the minimal requirements.
However, like @MarcCote proposal, I would be interested in adding this to a SyncVectorEnv or AsyncVectorEnv special parameter.

The current reset type definition for those classes is reset(self, seed: Union[int, list[int]], options: Optional[dict[str, Any]] -> tuple[ObsType, dict[str, Any]]. As we already have list[int] for setting the initial reset seeds of the individual environments, I'm unsure of what other options we have, using a callable is the best option I can think of other than adding another function. Any other thoughts?

pseudo-rnd-thoughts · 2022-12-26T22:43:09Z

Discussion on the step function call order, which if adopted is breaking backward compatibility change for both the AutoResetWrapper and new SyncVectorEnv and AsyncVectorEnv. The proposed function call if identical to the current implementation within EnvPool.This does not necessarily change the VectorEnv implementation.

@RedTachyon @vwxyzjn @araffin Would be interested in your thoughts

Considering a very simple training loop

import gymnasium as gym

env = gym.make("CartPole-v1")

obs, info = env.reset()

for training_step in range(1_000):
    action = policy(obs)
    next_obs, reward, terminated, truncated, next_info = env.step(action)
    
    replay_buffer.append((obs, info, action, next_obs, reward, terminated, truncated))
    
    if terminated or truncated:
        obs, info = env.reset()
    else:
        obs, info = next_obs, next_info

The AutoReset wrapper (and the Vector implementations) look very similar to this implementation. While the auto resetting removes the need for one if statement, I believe it requires adding a new if statement.

import gymnasium as gym

env = gym.make("CartPole-v1")
env = gym.wrappers.AutoResetWrapper(env)

obs, info = env.reset()

for training_step in range(1_000):
    action = policy(obs)
    next_obs, reward, terminated, truncated, next_info = env.step(action)
    
    if terminated or truncated:
        replay_buffer.append((obs, info, action, next_info["final_observation"], reward, terminated, truncated))
    else:
        replay_buffer.append((obs, info, action, next_obs, reward, terminated, truncated))
     
    obs, info = next_obs, next_info

I propose the following change to the AutoReset wrapper and vector implementations. The difference is when reset is called, in the current implementation, within the step call, if the resulting data has an episode ending then reset is called within the same step with the old obs and info packaged into the new info from reset. This PR changes it such that the reset will happen in the next step following the episode ending.

import gymnasium as gym

env = gym.make("CartPole-v1")
env = gym.experimental.wrappers.AutoReset(env)

obs, info = env.reset()

for training_step in range(1_000):
    action = policy(obs)
    next_obs, reward, terminated, truncated, next_info = env.step(action)
    
    replay_buffer.append((obs, info, action, next_obs, reward, terminated, truncated))
    obs, info = next_obs, next_info

Advantages

Simplifies training code, currently, you need to check if the sub-environment's episode ended then collect the last observation and info (for truncated cases) to save to the replay buffer. With a similar change to vector, users shouldn't need to use the info["final_observation"][env_num]

Disadvantage

This is large breaking change for old training code. This could be fixed through renaming the current implementation as LegacySyncVectorEnv.
If an episode has terminated then there is a "dead" observation where the action has to be ignored
It is only possible to convert "old" to "new" and not "new" to "old"

pseudo-rnd-thoughts mentioned this issue Oct 11, 2022

[Bug] Vector utils space functions updated for new spaces #49

Closed

MarcCote mentioned this issue Nov 4, 2022

Support list of options in VectorEnv.reset(). #113

Closed

10 tasks

pseudo-rnd-thoughts mentioned this issue Jan 16, 2023

New vector API + Functional environments + (vector) wrappers #97

Merged

pseudo-rnd-thoughts mentioned this issue Mar 15, 2023

[Proposal] AutoReset last step #389

Closed

1 task

pseudo-rnd-thoughts mentioned this issue Aug 28, 2023

Change autoreset order for wrapper and vector environments #694

Closed

araffin mentioned this issue Sep 25, 2023

Add check for common mistake when mixing Gym/VecEnv API DLR-RM/stable-baselines3#1696

Merged

16 tasks

pseudo-rnd-thoughts mentioned this issue Sep 25, 2023

[Question] Deprecation warning for wrapper space attributes? #725

Closed

pseudo-rnd-thoughts mentioned this issue Nov 22, 2023

Change autoreset order such that reset will happen on the next step #785

Closed

2 tasks

pseudo-rnd-thoughts closed this as completed Feb 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vector API - design considerations #32

Vector API - design considerations #32

RedTachyon commented Oct 1, 2022

pseudo-rnd-thoughts commented Oct 3, 2022 •

edited

Loading

MarcCote commented Nov 3, 2022

sven1977 commented Dec 19, 2022

pseudo-rnd-thoughts commented Dec 20, 2022 •

edited

Loading

pseudo-rnd-thoughts commented Dec 26, 2022

Vector API - design considerations #32

Vector API - design considerations #32

Comments

RedTachyon commented Oct 1, 2022

pseudo-rnd-thoughts commented Oct 3, 2022 • edited Loading

MarcCote commented Nov 3, 2022

sven1977 commented Dec 19, 2022

pseudo-rnd-thoughts commented Dec 20, 2022 • edited Loading

pseudo-rnd-thoughts commented Dec 26, 2022

pseudo-rnd-thoughts commented Oct 3, 2022 •

edited

Loading

pseudo-rnd-thoughts commented Dec 20, 2022 •

edited

Loading