This is the pytorch implementation of "Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification".
The original tensorflow version could be found here.
Currently only supports the training of env door-human-v0
. The support of the training of other environments will come out subsequently.
- python 3.7
- register wandb account
- mujoco
- other packages can be found in
requirements.txt
pip install -e .
git clone https://github.com/rail-berkeley/d4rl.git
cd d4rl
pip install -e .
- support the training of other envs in the metaworld.
All the arguments can be found in argments.py
.
python trainer.py