Chuning Zhu1, Max Simchowitz2, Siri Gadipudi1, Abhishek Gupta1
1University of Washington 2MIT
This is a PyTorch implementation for the RePo algorithm. RePo is a visual model-based reinforcement learning method that learns a minimally task-relevant representation, making it resilient to uncontrollable distractors in the environment. We also provide implementations of Dreamer, TIA, DBC, and DeepMDP.
git clone https://github.com/zchuning/repo.git
- Install Mujoco 2.1.0
- Install dependencies
pip install -r requirements.txt
To train on DMC with natural video distractors, download the driving_car videos from Kinetics 400 dataset following these instructions. Then, use one of the following commands to train an agent on distracted Walker Walk. To train on other distracted DMC environments,
replace walker-walk
with {domain}-{task}
:
# RePo
python experiments/train_repo.py --algo repo --env_id dmc_distracted-walker-walk --expr_name benchmark --seed 0
# Dreamer
python experiments/train_repo.py --algo dreamer --env_id dmc_distracted-walker-walk --expr_name benchmark --seed 0
# TIA
python experiments/train_repo.py --algo tia --env_id dmc_distracted-walker-walk --expr_name benchmark --seed 0
# DBC
python experiments/train_bisim.py --algo bisim --env_id dmc_distracted-walker-walk --expr_name benchmark --seed 0
# DeepMDP
python experiments/train_bisim.py --algo deepmdp --env_id dmc_distracted-walker-walk --expr_name benchmark --seed 0
First, download the background assets from this link and place the data
folder in the root directory of the repository.
Then, use the following command to train an agent on a Maniskill environment, where {task}
is one of {PushCubeMatterport, LiftCubeMatterport, TurnFaucetMatterport}
:
python experiments/train_repo.py --algo repo --env_id maniskill-{task} --expr_name benchmark --seed 0
To run adaptation experiments, first train an agent on the source domain and save the replay buffer:
python experiments/train_repo.py --algo repo --env_id dmc-walker-walk --expr_name benchmark --seed 0 --save_buffer True
Then run the adaptation experiment on the target domain using one of the following commands:
# Support constraint + calibration
python experiments/adapt_repo.py --algo repo_calibrate --env_id dmc_distracted-walker-walk --expr_name adaptaion --source_dir logdir/repo/dmc-walker-walk/benchmark/0 --seed 0
# Distribution matching + calibration
python experiments/adapt_repo.py --algo repo_calibrate --env_id dmc_distracted-walker-walk --expr_name adaptaion --source_dir logdir/repo/dmc-walker-walk/benchmark/0 --seed 0 --alignment_mode distribution
If you find this code useful, please cite:
@inproceedings{
zhu2023repo,
title={RePo: Resilient Model-Based Reinforcement Learning by Regularizing Posterior Predictability},
author={Chuning Zhu and Max Simchowitz and Siri Gadipudi and Abhishek Gupta},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
year={2023},
url={https://openreview.net/forum?id=OIJ3VXDy6s}
}