Experiment with dreamerv3 on polyburn

phisn · May 10, 2024 · 390f863 · 390f863
1 parent f55d320
commit 390f863
Show file tree

Hide file tree

Showing 101 changed files with 14,892 additions and 16 deletions.
diff --git a/packages/learning-gym/dreamerv3/.dockerignore b/packages/learning-gym/dreamerv3/.dockerignore
@@ -0,0 +1,3 @@
+*.py[cod]
+__pycache__/
+dist
diff --git a/packages/learning-gym/dreamerv3/.gitignore b/packages/learning-gym/dreamerv3/.gitignore
@@ -0,0 +1,7 @@
+.pytest_cache
+dist
+__pycache__/
+*.py[cod]
+*.egg-info
+MUJOCO_LOG.TXT
+;
diff --git a/packages/learning-gym/dreamerv3/LICENSE b/packages/learning-gym/dreamerv3/LICENSE
@@ -0,0 +1,19 @@
+Copyright (c) 2023 Danijar Hafner
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/packages/learning-gym/dreamerv3/MANIFEST.in b/packages/learning-gym/dreamerv3/MANIFEST.in
@@ -0,0 +1 @@
+include dreamerv3/requirements.txt
diff --git a/packages/learning-gym/dreamerv3/README.md b/packages/learning-gym/dreamerv3/README.md
@@ -0,0 +1,127 @@
+# Mastering Diverse Domains through World Models
+
+A reimplementation of [DreamerV3][paper], a scalable and general reinforcement
+learning algorithm that masters a wide range of applications with fixed
+hyperparameters.
+
+![DreamerV3 Tasks](https://user-images.githubusercontent.com/2111293/217647148-cbc522e2-61ad-4553-8e14-1ecdc8d9438b.gif)
+
+If you find this code useful, please reference in your paper:
+
+```
+@article{hafner2023dreamerv3,
+  title={Mastering Diverse Domains through World Models},
+  author={Hafner, Danijar and Pasukonis, Jurgis and Ba, Jimmy and Lillicrap, Timothy},
+  journal={arXiv preprint arXiv:2301.04104},
+  year={2023}
+}
+```
+
+To learn more:
+
+- [Research paper][paper]
+- [Project website][website]
+- [Twitter summary][tweet]
+
+## DreamerV3
+
+DreamerV3 learns a world model from experiences and uses it to train an actor
+critic policy from imagined trajectories. The world model encodes sensory
+inputs into categorical representations and predicts future representations and
+rewards given actions.
+
+![DreamerV3 Method Diagram](https://user-images.githubusercontent.com/2111293/217355673-4abc0ce5-1a4b-4366-a08d-64754289d659.png)
+
+DreamerV3 masters a wide range of domains with a fixed set of hyperparameters,
+outperforming specialized methods. Removing the need for tuning reduces the
+amount of expert knowledge and computational resources needed to apply
+reinforcement learning.
+
+![DreamerV3 Benchmark Scores](https://github.com/danijar/dreamerv3/assets/2111293/0fe8f1cf-6970-41ea-9efc-e2e2477e7861)
+
+Due to its robustness, DreamerV3 shows favorable scaling properties. Notably,
+using larger models consistently increases not only its final performance but
+also its data-efficiency. Increasing the number of gradient steps further
+increases data efficiency.
+
+![DreamerV3 Scaling Behavior](https://user-images.githubusercontent.com/2111293/217356063-0cf06b17-89f0-4d5f-85a9-b583438c98dd.png)
+
+# Instructions
+
+The code has been tested on Linux and Mac and requires Python 3.11+.
+
+## Docker
+
+You can either use the provided `Dockerfile` that contains instructions or
+follow the manual instructions below.
+
+## Manual
+
+Install [JAX][jax] and then the other dependencies:
+
+```sh
+pip install -U -r embodied/requirements.txt
+pip install -U -r dreamerv3/requirements.txt \
+  -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
+```
+
+Simple training script:
+
+```sh
+python example.py
+```
+
+Flexible training script:
+
+```sh
+python dreamerv3/main.py \
+  --logdir ~/logdir/{timestamp} \
+  --configs crafter \
+  --run.train_ratio 32
+```
+
+To reproduce results, train on the desired task using the corresponding config,
+such as `--configs atari --task atari_pong`.
+
+# Tips
+
+- All config options are listed in `configs.yaml` and you can override them
+  as flags from the command line.
+- The `debug` config block reduces the network size, batch size, duration
+  between logs, and so on for fast debugging (but does not learn a good model).
+- By default, the code tries to run on GPU. You can switch to CPU or TPU using
+  the `--jax.platform cpu` flag.
+- You can use multiple config blocks that will override defaults in the
+  order they are specified, for example `--configs crafter size50m`.
+- By default, metrics are printed to the terminal, appended to a JSON lines
+  file, and written as TensorBoard summaries. Other outputs like WandB can be
+  enabled in the training script.
+- If you get a `Too many leaves for PyTreeDef` error, it means you're
+  reloading a checkpoint that is not compatible with the current config. This
+  often happens when reusing an old logdir by accident.
+- If you are getting CUDA errors, scroll up because the cause is often just an
+  error that happened earlier, such as out of memory or incompatible JAX and
+  CUDA versions. Try `--batch_size 1` to rule out an out of memory error.
+- Many environments are included, some of which require installing additional
+  packages. See the `Dockerfile` for reference.
+- When running on custom environments, make sure to specify the observation
+  keys the agent should be using via the `enc.spaces` and `dec.spaces` regex
+  patterns.
+- To log metrics from environments without showing them to the agent or storing
+  them in the replay buffer, return them as observation keys with `log_` prefix
+  and enable logging via the `run.log_keys_...` options.
+- To continue stopped training runs, simply run the same command line again and
+  make sure that the `--logdir` points to the same directory.
+
+# Disclaimer
+
+This repository contains a reimplementation of DreamerV3 based on the open
+source DreamerV2 code base. It is unrelated to Google or DeepMind. The
+implementation has been tested to reproduce the official results on a range of
+environments.
+
+[jax]: https://github.com/google/jax#pip-installation-gpu-cuda
+[paper]: https://arxiv.org/pdf/2301.04104v1.pdf
+[website]: https://danijar.com/dreamerv3
+[tweet]: https://twitter.com/danijarh/status/1613161946223677441
+[example]: https://github.com/danijar/dreamerv3/blob/main/example.py
diff --git a/packages/learning-gym/dreamerv3/dreamerv3/Dockerfile b/packages/learning-gym/dreamerv3/dreamerv3/Dockerfile
@@ -0,0 +1,75 @@
+# Instructions
+#
+# 1) Test setup:
+#
+#   docker run -it --rm --gpus all --privileged <base image> \
+#     sh -c 'ldconfig; nvidia-smi'
+#
+# 2) Start training:
+#
+#   docker build -f dreamerv3/Dockerfile -t img . && \
+#   docker run -it --rm --gpus all -v ~/logdir/docker:/logdir img \
+#     sh -c 'ldconfig; sh embodied/scripts/xvfb_run.sh python dreamerv3/main.py \
+#       --logdir "/logdir/{timestamp}" --configs atari --task atari_pong'
+#
+# 3) See results:
+#
+#   tensorboard --logdir ~/logdir/docker
+#
+
+# System
+FROM nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04
+ENV DEBIAN_FRONTEND=noninteractive
+ENV TZ=America/San_Francisco
+ENV PYTHONUNBUFFERED 1
+ENV PIP_NO_CACHE_DIR 1
+ENV PIP_ROOT_USER_ACTION=ignore
+RUN apt-get update && apt-get install -y \
+  ffmpeg git vim curl software-properties-common \
+  libglew-dev x11-xserver-utils xvfb \
+  && apt-get clean
+
+# Workdir
+RUN mkdir /app
+WORKDIR /app
+
+# Python
+RUN add-apt-repository ppa:deadsnakes/ppa
+RUN apt-get update && apt-get install -y python3.11-dev python3.11-venv && apt-get clean
+RUN python3.11 -m venv ./venv --upgrade-deps
+ENV PATH="/app/venv/bin:$PATH"
+RUN pip install --upgrade pip setuptools
+
+# Envs
+COPY embodied/scripts/install-minecraft.sh .
+RUN sh install-minecraft.sh
+COPY embodied/scripts/install-dmlab.sh .
+RUN sh install-dmlab.sh
+RUN pip install ale_py autorom[accept-rom-license]
+RUN pip install procgen_mirror
+RUN pip install crafter
+RUN pip install dm_control
+RUN pip install memory_maze
+ENV MUJOCO_GL egl
+ENV NUMBA_CACHE_DIR /tmp
+
+# Agent
+COPY dreamerv3/requirements.txt agent-requirements.txt
+RUN pip install -r agent-requirements.txt \
+  -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
+ENV XLA_PYTHON_CLIENT_MEM_FRACTION 0.8
+
+# Embodied
+COPY embodied/requirements.txt embodied-requirements.txt
+RUN pip install -r embodied-requirements.txt
+
+# Source
+COPY . .
+
+# Cloud
+ENV GCS_RESOLVE_REFRESH_SECS=60
+ENV GCS_REQUEST_CONNECTION_TIMEOUT_SECS=300
+ENV GCS_METADATA_REQUEST_TIMEOUT_SECS=300
+ENV GCS_READ_REQUEST_TIMEOUT_SECS=300
+ENV GCS_WRITE_REQUEST_TIMEOUT_SECS=600
+RUN chown 1000:root . && chmod 775 .
diff --git a/packages/learning-gym/dreamerv3/dreamerv3/__init__.py b/packages/learning-gym/dreamerv3/dreamerv3/__init__.py
@@ -0,0 +1,2 @@
+from .agent import Agent
+from .main import wrap_env
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		from .agent import Agent
		from .main import wrap_env