Skip to content

Commit

Permalink
[Doc] Doc revamp (pytorch#782)
Browse files Browse the repository at this point in the history
  • Loading branch information
vmoens authored Jan 3, 2023
1 parent e958503 commit 45b5d72
Show file tree
Hide file tree
Showing 3 changed files with 142 additions and 25 deletions.
35 changes: 19 additions & 16 deletions docs/source/reference/envs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -195,28 +195,31 @@ in the environment. The keys to be included in this inverse transform are passed

Transform
TransformedEnv
RewardClipping
Resize
BinarizeReward
CatFrames
CatTensors
CenterCrop
GrayScale
Compose
ToTensorImage
ObservationNorm
FlattenObservation
UnsqueezeTransform
RewardScaling
ObservationTransform
CatFrames
FiniteTensorDictCheck
DoubleToFloat
CatTensors
FiniteTensorDictCheck
FlattenObservation
FrameSkipTransform
GrayScale
gSDENoise
NoopResetEnv
BinarizeReward
ObservationNorm
ObservationTransform
PinMemoryTransform
VecNorm
gSDENoise
TensorDictPrimer
Resize
RewardClipping
RewardScaling
RewardSum
SqueezeTransform
StepCounter
TensorDictPrimer
ToTensorImage
UnsqueezeTransform
VecNorm
R3MTransform
VIPTransform
VIPRewardTransform
Expand Down
28 changes: 24 additions & 4 deletions knowledge_base/GYM.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,30 @@
# Working with gym

## What is OpenAI Gym?

OpenAI Gym is a python library that provides the tooling for coding and using
environments in RL contexts. The environments can be either simulators or real
world systems (such as robots or games).
Due to its easiness of use, Gym has been widely adopted as one the main APIs for
environment interaction in RL and control.

Historically, Gym was started by OpenAI on [https://github.com/openai/gym](https://github.com/openai/gym).
Since then, OpenAI has ceased to maintain it and the library has been forked out
in [Gymnasium](https://github.com/Farama-Foundation/Gymnasium) by the Farama Foundation.

Check the [Gym documentation](https://www.gymlibrary.dev/) for further details
about the installation and usage.

## Versioning
TorchRL is tested against the latest version of gym and we only guarantee compatibility
against the gym version that was available at the time of release.
The OpenAI Gym library is known to have gone through multiple BC breaking changes
and significant user-facing API modifications.
In practice, TorchRL is tested against gym 0.13 and further and should work with
any version in between.

However, libraries built around Gym may have a custom env construction process
that breaks the automatic wrapping from the `GymEnv` class. In those cases, it
is best to first create the gym environment and wrap it using
`torchrl.envs.libs.gym.GymWrapper`.

However, for specific projects we may be willing to work on keeping a backward
compatibility with older versions of gym.
If you run into an issue when running TorchRL with a specific version of gym,
feel free to open an issue and we will gladly look into this.
104 changes: 99 additions & 5 deletions knowledge_base/PRO-TIPS.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,101 @@
# Pro-tips
# Pro-tips and Debugging

## Training on a cluster
## Gradient-related errors \[Newcomers\]

Newcomers often face gradient-related issues when coding up an RL algorithm from scratch.
The typical training loop can usually be sketched as follows:
```python

obs = env.reset()

for _ in range(n_training_steps):
# STEP 1: data collection
# Get a new datapoint "online"
observations = []
actions = []
others = []
for _ in range(n_data_per_training):
with torch.no_grad():
action = policy(obs)
obs, *other = env.step(action)
observations.append(obs)
actions.append(action)
others.append(other)
replay_buffer.extend(observations, actions, others)

# STEP 2: loss and optimization
# => compute loss "offline"
loss = loss_fn(replay_buffer.sample(batch_size))

loss.backward()
optim.step()
optim.zero_grad()

```

A series of errors come from wanting to backpropagate through the policy operation
that is decorated by the `no_grad()` context manager. In fact, this operation should
(in most cases) not be part of any computational graph. Instead, all the differentiable
operations should be executed in the `loss_fn(...)` abstraction.
In general, RL is a domain where one should pay attention to understanding well
what should be considered as non-differentiable "data" (e.g. environment
interactions, advantage and return computation, 'denominator' log-probability in PPO)
and what should be considered as differentiable loss artifacts
(e.g. value error, 'numerator' log-probability in PPO).

Errors to look for that may be related to this misconception are the following:
- `RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed).`
This error usually appears after a datapoint that is part of a compuational graph is used twice
in the loss function. Some users try to fix this by calling `loss.backward(retain_graph=True)`, but this will lead
to the next error of this list.
**Related discussed PyTorch errors**:
- [here](https://discuss.pytorch.org/t/how-to-properly-create-a-batch-with-torch-tensor/169217)
- [here](https://discuss.pytorch.org/t/i-am-training-my-multi-agents-reinforcement-learning-project-and-i-got-an-error-trying-to-backward-through-the-graph-a-second-time/152352)

- `RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation`
This typically occurs after one fixes the first error with a `retain_graph=True` flag. Instead, the operation
that is to be differentiated through should be re-computed in the `loss_fn`.
Another common reason is that two modules are updated using a shared compuational graph (e.g. the policy and the critic).
In that case the `retain_graph=True` flag should be used, although one should be careful as this
may accumulate gradients of one loss onto the other. In general, it's better practice to
re-compute each intermediate value for each loss separately while excluding the parameters
that are not necessary from the specific graph, even if the forward call of some submodules match.
**Related discussed PyTorch errors**:
- [here](https://discuss.pytorch.org/t/runtimeerror-one-of-the-variables-needed-for-gradient-computation-has-been-modified-by-an-inplace-operation-torch-floattensor-3-1-which-is-output-0-of-tanhbackward-is-at-version-1-expected-version-0-instead/87630)
- [here](https://discuss.pytorch.org/t/in-place-operation-error-while-training-maddpg/151622)

- Algorithm is not learning / `param.grad` is 0 or None.
An algorithm not learning can have multiple causes. The first thing to look at
is the value of the parameter gradients, whose norm should be strictly non-negative.
**Related PyTorch discussed errors**:
- [here](https://discuss.pytorch.org/t/multi-threaded-backprop-failing-in-a3c-implementation/157132/5)

## My Training is too slow \[Newcomers / intermediate\]
- RL is known to be CPU-intensive in some instances. Even when running a few
environments in parallel, you can see a great speed-up by asking for more cores on your cluster
than the number of environments you're working with (twice as much for instance). This
is also and especially true for environments that are rendered (even if they are rendered on GPU).
environments in parallel, you can see a great speed-up by asking for more cores on your cluster
than the number of environments you're working with (twice as much for instance). This
is also and especially true for environments that are rendered (even if they are rendered on GPU).
- The speed of training depends upon several factors and there is not a one-fits-all
solution to every problem. The common bottlnecks are:
- **data collection**: the simulator speed may affect performance, as can the data
transformation that follows. Speeding up environment interactions is usually
done via vectorization (if the simulators enables it, e.g. Brax and other Jax-based
simulators) or parallelization (which is improperly called vectorized envs in gym
and other libraries). In TorchRL, transformations can usually be executed on device.
- **Replay buffer storage and sampling**: storing items in a replay buffer can
take time if the underlying operation requires some heavy memory manipulation
or tedeious indexing (e.g. with prioritized replay buffers). Sampling can
also take a considerable amount of time if the data isn't stored contiguously
and/or if costly stacking of concatenation operations are performed.
TorchRL provides efficient contiguous storage solutions and efficient writing
and sampling solutions in these cases.
- **Advantage computation**: computing advantage functions can also constitute
a computational bottleneck as these are usually coded using plain for loops.
If profiling indicates that this operation is taking a considerable amount
of time, consider using our fully vectorized solutions instead.
- **Loss compuation**: The loss computation and the optimization
steps are frequently responsible of a significant share of the compute time.
Some techniques can speed things up. For instance, if multiple target networks
are being used, using vectorized maps and functional programming (through
functorch) instead of looping over the model configurations can provide a
significant speedup.

0 comments on commit 45b5d72

Please sign in to comment.