[Doc] Doc revamp (pytorch#782)

maxweissenbacher · Jan 3, 2023 · 45b5d72 · 45b5d72
1 parent e958503
commit 45b5d72
Show file tree

Hide file tree

Showing 3 changed files with 142 additions and 25 deletions.
diff --git a/docs/source/reference/envs.rst b/docs/source/reference/envs.rst
@@ -195,28 +195,31 @@ in the environment. The keys to be included in this inverse transform are passed
 
     Transform
     TransformedEnv
-    RewardClipping
-    Resize
+    BinarizeReward
+    CatFrames
+    CatTensors
     CenterCrop
-    GrayScale
     Compose
-    ToTensorImage
-    ObservationNorm
-    FlattenObservation
-    UnsqueezeTransform
-    RewardScaling
-    ObservationTransform
-    CatFrames
-    FiniteTensorDictCheck
     DoubleToFloat
-    CatTensors
+    FiniteTensorDictCheck
+    FlattenObservation
+    FrameSkipTransform
+    GrayScale
+    gSDENoise
     NoopResetEnv
-    BinarizeReward
+    ObservationNorm
+    ObservationTransform
     PinMemoryTransform
-    VecNorm
-    gSDENoise
-    TensorDictPrimer
+    Resize
+    RewardClipping
+    RewardScaling
     RewardSum
+    SqueezeTransform
+    StepCounter
+    TensorDictPrimer
+    ToTensorImage
+    UnsqueezeTransform
+    VecNorm
     R3MTransform
     VIPTransform
     VIPRewardTransform

diff --git a/knowledge_base/GYM.md b/knowledge_base/GYM.md
@@ -1,10 +1,30 @@
 # Working with gym
 
+## What is OpenAI Gym?
+
+OpenAI Gym is a python library that provides the tooling for coding and using
+environments in RL contexts. The environments can be either simulators or real
+world systems (such as robots or games).
+Due to its easiness of use, Gym has been widely adopted as one the main APIs for
+environment interaction in RL and control. 
+
+Historically, Gym was started by OpenAI on [https://github.com/openai/gym](https://github.com/openai/gym).
+Since then, OpenAI has ceased to maintain it and the library has been forked out
+in [Gymnasium](https://github.com/Farama-Foundation/Gymnasium) by the Farama Foundation.
+
+Check the [Gym documentation](https://www.gymlibrary.dev/) for further details
+about the installation and usage.
+
 ## Versioning
-TorchRL is tested against the latest version of gym and we only guarantee compatibility
-against the gym version that was available at the time of release.
+The OpenAI Gym library is known to have gone through multiple BC breaking changes
+and significant user-facing API modifications.
+In practice, TorchRL is tested against gym 0.13 and further and should work with
+any version in between.
+
+However, libraries built around  Gym may have a custom env construction process
+that breaks the automatic wrapping from the `GymEnv` class. In those cases, it
+is best to first create the gym environment and wrap it using
+`torchrl.envs.libs.gym.GymWrapper`.
 
-However, for specific projects we may be willing to work on keeping a backward 
-compatibility with older versions of gym. 
 If you run into an issue when running TorchRL with a specific version of gym, 
 feel free to open an issue and we will gladly look into this.
diff --git a/knowledge_base/PRO-TIPS.md b/knowledge_base/PRO-TIPS.md
@@ -1,7 +1,101 @@
-# Pro-tips
+# Pro-tips and Debugging
 
-## Training on a cluster
+## Gradient-related errors \[Newcomers\]
+
+Newcomers often face gradient-related issues when coding up an RL algorithm from scratch.
+The typical training loop can usually be sketched as follows:
+```python
+
+obs = env.reset()
+
+for _ in range(n_training_steps):
+    # STEP 1: data collection
+    # Get a new datapoint "online"
+    observations = []
+    actions = []
+    others = []
+    for _ in range(n_data_per_training):
+        with torch.no_grad():
+            action = policy(obs)
+        obs, *other = env.step(action)
+        observations.append(obs)
+        actions.append(action)
+        others.append(other)
+    replay_buffer.extend(observations, actions, others)
+
+    # STEP 2: loss and optimization
+    # => compute loss "offline"
+    loss = loss_fn(replay_buffer.sample(batch_size))
+
+    loss.backward()
+    optim.step()
+    optim.zero_grad()
+
+```
+
+A series of errors come from wanting to backpropagate through the policy operation
+that is decorated by the `no_grad()` context manager. In fact, this operation should
+(in most cases) not be part of any computational graph. Instead, all the differentiable
+operations should be executed in the `loss_fn(...)` abstraction.
+In general, RL is a domain where one should pay attention to understanding well
+what should be considered as non-differentiable "data" (e.g. environment 
+interactions, advantage and return computation, 'denominator' log-probability in PPO)
+and what should be considered as  differentiable loss artifacts
+(e.g. value error, 'numerator' log-probability in PPO).
+
+Errors to look for that may be related to this misconception are the following:
+- `RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed).`
+  This error usually appears after a datapoint that is part of a compuational graph is used twice
+  in the loss function. Some users try to fix this by calling `loss.backward(retain_graph=True)`, but this will lead
+  to the next error of this list.
+  **Related discussed PyTorch errors**:
+  - [here](https://discuss.pytorch.org/t/how-to-properly-create-a-batch-with-torch-tensor/169217)
+  - [here](https://discuss.pytorch.org/t/i-am-training-my-multi-agents-reinforcement-learning-project-and-i-got-an-error-trying-to-backward-through-the-graph-a-second-time/152352)
+
+- `RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation`
+  This typically occurs after one fixes the first error with a `retain_graph=True` flag. Instead, the operation
+  that is to be differentiated through should be re-computed in the `loss_fn`.
+  Another common reason is that two modules are updated using a shared compuational graph (e.g. the policy and the critic).
+  In that case the `retain_graph=True` flag should be used, although one should be careful as this
+  may accumulate gradients of one loss onto the other. In general, it's better practice to
+  re-compute each intermediate value for each loss separately while excluding the parameters
+  that are not necessary from the specific graph, even if the forward call of some submodules match.
+  **Related discussed PyTorch errors**:
+  - [here](https://discuss.pytorch.org/t/runtimeerror-one-of-the-variables-needed-for-gradient-computation-has-been-modified-by-an-inplace-operation-torch-floattensor-3-1-which-is-output-0-of-tanhbackward-is-at-version-1-expected-version-0-instead/87630)
+  - [here](https://discuss.pytorch.org/t/in-place-operation-error-while-training-maddpg/151622)
+
+- Algorithm is not learning / `param.grad` is 0 or None.
+  An algorithm not learning can have multiple causes. The first thing to look at
+  is the value of the parameter gradients, whose norm should be strictly non-negative.
+  **Related PyTorch discussed errors**:
+  - [here](https://discuss.pytorch.org/t/multi-threaded-backprop-failing-in-a3c-implementation/157132/5)
+
+## My Training is too slow \[Newcomers / intermediate\]
 - RL is known to be CPU-intensive in some instances. Even when running a few
-    environments in parallel, you can see a great speed-up by asking for more cores on your cluster
-    than the number of environments you're working with (twice as much for instance). This
-    is also and especially true for environments that are rendered (even if they are rendered on GPU). 
+  environments in parallel, you can see a great speed-up by asking for more cores on your cluster
+  than the number of environments you're working with (twice as much for instance). This
+  is also and especially true for environments that are rendered (even if they are rendered on GPU). 
+- The speed of training depends upon several factors and there is not a one-fits-all
+  solution to every problem. The common bottlnecks are:
+  - **data collection**: the simulator speed may affect performance, as can the data
+    transformation that follows. Speeding up environment interactions is usually
+    done via vectorization (if the simulators enables it, e.g. Brax and other Jax-based
+    simulators) or parallelization (which is improperly called vectorized envs in gym
+    and other libraries). In TorchRL, transformations can usually be executed on device.
+  - **Replay buffer storage and sampling**: storing items in a replay buffer can
+    take time if the underlying operation requires some heavy memory manipulation
+    or tedeious indexing (e.g. with prioritized replay buffers). Sampling can
+    also take a considerable amount of time if the data isn't stored contiguously
+    and/or if costly stacking of concatenation operations are performed.
+    TorchRL provides efficient contiguous storage solutions and efficient writing
+    and sampling solutions in these cases.
+  - **Advantage computation**: computing advantage functions can also constitute
+    a computational bottleneck as these are usually coded using plain for loops.
+    If profiling indicates that this operation is taking a considerable amount
+    of time, consider using our fully vectorized solutions instead.
+  - **Loss compuation**: The loss computation and the optimization
+    steps are frequently responsible of a significant share of the compute time.
+    Some techniques can speed things up. For instance, if multiple target networks
+    are being used, using vectorized maps and functional programming (through 
+    functorch) instead of looping over the model configurations can provide a
+    significant speedup.