Skip to content

Commit

Permalink
Experiment with dreamerv3 on polyburn
Browse files Browse the repository at this point in the history
  • Loading branch information
phisn committed May 10, 2024
1 parent f55d320 commit 390f863
Show file tree
Hide file tree
Showing 101 changed files with 14,892 additions and 16 deletions.
3 changes: 3 additions & 0 deletions packages/learning-gym/dreamerv3/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
*.py[cod]
__pycache__/
dist
7 changes: 7 additions & 0 deletions packages/learning-gym/dreamerv3/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
.pytest_cache
dist
__pycache__/
*.py[cod]
*.egg-info
MUJOCO_LOG.TXT
;
19 changes: 19 additions & 0 deletions packages/learning-gym/dreamerv3/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
Copyright (c) 2023 Danijar Hafner

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
1 change: 1 addition & 0 deletions packages/learning-gym/dreamerv3/MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
include dreamerv3/requirements.txt
127 changes: 127 additions & 0 deletions packages/learning-gym/dreamerv3/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# Mastering Diverse Domains through World Models

A reimplementation of [DreamerV3][paper], a scalable and general reinforcement
learning algorithm that masters a wide range of applications with fixed
hyperparameters.

![DreamerV3 Tasks](https://user-images.githubusercontent.com/2111293/217647148-cbc522e2-61ad-4553-8e14-1ecdc8d9438b.gif)

If you find this code useful, please reference in your paper:

```
@article{hafner2023dreamerv3,
title={Mastering Diverse Domains through World Models},
author={Hafner, Danijar and Pasukonis, Jurgis and Ba, Jimmy and Lillicrap, Timothy},
journal={arXiv preprint arXiv:2301.04104},
year={2023}
}
```

To learn more:

- [Research paper][paper]
- [Project website][website]
- [Twitter summary][tweet]

## DreamerV3

DreamerV3 learns a world model from experiences and uses it to train an actor
critic policy from imagined trajectories. The world model encodes sensory
inputs into categorical representations and predicts future representations and
rewards given actions.

![DreamerV3 Method Diagram](https://user-images.githubusercontent.com/2111293/217355673-4abc0ce5-1a4b-4366-a08d-64754289d659.png)

DreamerV3 masters a wide range of domains with a fixed set of hyperparameters,
outperforming specialized methods. Removing the need for tuning reduces the
amount of expert knowledge and computational resources needed to apply
reinforcement learning.

![DreamerV3 Benchmark Scores](https://github.com/danijar/dreamerv3/assets/2111293/0fe8f1cf-6970-41ea-9efc-e2e2477e7861)

Due to its robustness, DreamerV3 shows favorable scaling properties. Notably,
using larger models consistently increases not only its final performance but
also its data-efficiency. Increasing the number of gradient steps further
increases data efficiency.

![DreamerV3 Scaling Behavior](https://user-images.githubusercontent.com/2111293/217356063-0cf06b17-89f0-4d5f-85a9-b583438c98dd.png)

# Instructions

The code has been tested on Linux and Mac and requires Python 3.11+.

## Docker

You can either use the provided `Dockerfile` that contains instructions or
follow the manual instructions below.

## Manual

Install [JAX][jax] and then the other dependencies:

```sh
pip install -U -r embodied/requirements.txt
pip install -U -r dreamerv3/requirements.txt \
-f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
```

Simple training script:

```sh
python example.py
```

Flexible training script:

```sh
python dreamerv3/main.py \
--logdir ~/logdir/{timestamp} \
--configs crafter \
--run.train_ratio 32
```

To reproduce results, train on the desired task using the corresponding config,
such as `--configs atari --task atari_pong`.

# Tips

- All config options are listed in `configs.yaml` and you can override them
as flags from the command line.
- The `debug` config block reduces the network size, batch size, duration
between logs, and so on for fast debugging (but does not learn a good model).
- By default, the code tries to run on GPU. You can switch to CPU or TPU using
the `--jax.platform cpu` flag.
- You can use multiple config blocks that will override defaults in the
order they are specified, for example `--configs crafter size50m`.
- By default, metrics are printed to the terminal, appended to a JSON lines
file, and written as TensorBoard summaries. Other outputs like WandB can be
enabled in the training script.
- If you get a `Too many leaves for PyTreeDef` error, it means you're
reloading a checkpoint that is not compatible with the current config. This
often happens when reusing an old logdir by accident.
- If you are getting CUDA errors, scroll up because the cause is often just an
error that happened earlier, such as out of memory or incompatible JAX and
CUDA versions. Try `--batch_size 1` to rule out an out of memory error.
- Many environments are included, some of which require installing additional
packages. See the `Dockerfile` for reference.
- When running on custom environments, make sure to specify the observation
keys the agent should be using via the `enc.spaces` and `dec.spaces` regex
patterns.
- To log metrics from environments without showing them to the agent or storing
them in the replay buffer, return them as observation keys with `log_` prefix
and enable logging via the `run.log_keys_...` options.
- To continue stopped training runs, simply run the same command line again and
make sure that the `--logdir` points to the same directory.

# Disclaimer

This repository contains a reimplementation of DreamerV3 based on the open
source DreamerV2 code base. It is unrelated to Google or DeepMind. The
implementation has been tested to reproduce the official results on a range of
environments.

[jax]: https://github.com/google/jax#pip-installation-gpu-cuda
[paper]: https://arxiv.org/pdf/2301.04104v1.pdf
[website]: https://danijar.com/dreamerv3
[tweet]: https://twitter.com/danijarh/status/1613161946223677441
[example]: https://github.com/danijar/dreamerv3/blob/main/example.py
75 changes: 75 additions & 0 deletions packages/learning-gym/dreamerv3/dreamerv3/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Instructions
#
# 1) Test setup:
#
# docker run -it --rm --gpus all --privileged <base image> \
# sh -c 'ldconfig; nvidia-smi'
#
# 2) Start training:
#
# docker build -f dreamerv3/Dockerfile -t img . && \
# docker run -it --rm --gpus all -v ~/logdir/docker:/logdir img \
# sh -c 'ldconfig; sh embodied/scripts/xvfb_run.sh python dreamerv3/main.py \
# --logdir "/logdir/{timestamp}" --configs atari --task atari_pong'
#
# 3) See results:
#
# tensorboard --logdir ~/logdir/docker
#

# System
FROM nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04
ENV DEBIAN_FRONTEND=noninteractive
ENV TZ=America/San_Francisco
ENV PYTHONUNBUFFERED 1
ENV PIP_NO_CACHE_DIR 1
ENV PIP_ROOT_USER_ACTION=ignore
RUN apt-get update && apt-get install -y \
ffmpeg git vim curl software-properties-common \
libglew-dev x11-xserver-utils xvfb \
&& apt-get clean

# Workdir
RUN mkdir /app
WORKDIR /app

# Python
RUN add-apt-repository ppa:deadsnakes/ppa
RUN apt-get update && apt-get install -y python3.11-dev python3.11-venv && apt-get clean
RUN python3.11 -m venv ./venv --upgrade-deps
ENV PATH="/app/venv/bin:$PATH"
RUN pip install --upgrade pip setuptools

# Envs
COPY embodied/scripts/install-minecraft.sh .
RUN sh install-minecraft.sh
COPY embodied/scripts/install-dmlab.sh .
RUN sh install-dmlab.sh
RUN pip install ale_py autorom[accept-rom-license]
RUN pip install procgen_mirror
RUN pip install crafter
RUN pip install dm_control
RUN pip install memory_maze
ENV MUJOCO_GL egl
ENV NUMBA_CACHE_DIR /tmp

# Agent
COPY dreamerv3/requirements.txt agent-requirements.txt
RUN pip install -r agent-requirements.txt \
-f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
ENV XLA_PYTHON_CLIENT_MEM_FRACTION 0.8

# Embodied
COPY embodied/requirements.txt embodied-requirements.txt
RUN pip install -r embodied-requirements.txt

# Source
COPY . .

# Cloud
ENV GCS_RESOLVE_REFRESH_SECS=60
ENV GCS_REQUEST_CONNECTION_TIMEOUT_SECS=300
ENV GCS_METADATA_REQUEST_TIMEOUT_SECS=300
ENV GCS_READ_REQUEST_TIMEOUT_SECS=300
ENV GCS_WRITE_REQUEST_TIMEOUT_SECS=600
RUN chown 1000:root . && chmod 775 .
2 changes: 2 additions & 0 deletions packages/learning-gym/dreamerv3/dreamerv3/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
from .agent import Agent
from .main import wrap_env
Loading

0 comments on commit 390f863

Please sign in to comment.