-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit f216fdf
Showing
61 changed files
with
9,623 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,175 @@ | ||
## Core latex/pdflatex auxiliary files: | ||
*.aux | ||
*.lof | ||
*.log | ||
*.lot | ||
*.fls | ||
*.out | ||
*.toc | ||
*.fmt | ||
.DS_Store | ||
*/temp/* | ||
*.pyc | ||
*./.idea/* | ||
.idea/* | ||
*.DS_Store* | ||
*.ipynb_checkpoints/* | ||
notebooks/.ipynb_checkpoints/* | ||
*.dropbox* | ||
*Icon* | ||
*/__pycache__/* | ||
*/.ipynb_checkpoints/* | ||
## Intermediate documents: | ||
*.dvi | ||
*-converted-to.* | ||
# these rules might exclude image files for figures etc. | ||
# *.ps | ||
# *.eps | ||
|
||
## Bibliography auxiliary files (bibtex/biblatex/biber): | ||
*.bbl | ||
*.bcf | ||
*.blg | ||
*-blx.aux | ||
*-blx.bib | ||
*.brf | ||
*.run.xml | ||
|
||
## Build tool auxiliary files: | ||
*.fdb_latexmk | ||
*.synctex | ||
*.synctex.gz | ||
*.synctex.gz(busy) | ||
*.pdfsync | ||
|
||
## Auxiliary and intermediate files from other packages: | ||
# algorithms | ||
*.alg | ||
*.loa | ||
|
||
# achemso | ||
acs-*.bib | ||
|
||
# amsthm | ||
*.thm | ||
|
||
# beamer | ||
*.nav | ||
*.snm | ||
*.vrb | ||
|
||
# cprotect | ||
*.cpt | ||
|
||
#(e)ledmac/(e)ledpar | ||
*.end | ||
*.[1-9] | ||
*.[1-9][0-9] | ||
*.[1-9][0-9][0-9] | ||
*.[1-9]R | ||
*.[1-9][0-9]R | ||
*.[1-9][0-9][0-9]R | ||
*.eledsec[1-9] | ||
*.eledsec[1-9]R | ||
*.eledsec[1-9][0-9] | ||
*.eledsec[1-9][0-9]R | ||
*.eledsec[1-9][0-9][0-9] | ||
*.eledsec[1-9][0-9][0-9]R | ||
|
||
# glossaries | ||
*.acn | ||
*.acr | ||
*.glg | ||
*.glo | ||
*.gls | ||
|
||
# gnuplottex | ||
*-gnuplottex-* | ||
|
||
# hyperref | ||
*.brf | ||
|
||
# knitr | ||
*-concordance.tex | ||
*.tikz | ||
*-tikzDictionary | ||
|
||
# listings | ||
*.lol | ||
|
||
# makeidx | ||
*.idx | ||
*.ilg | ||
*.ind | ||
*.ist | ||
|
||
# minitoc | ||
*.maf | ||
*.mtc | ||
*.mtc[0-9] | ||
*.mtc[1-9][0-9] | ||
|
||
# minted | ||
_minted* | ||
*.pyg | ||
*.pyc | ||
# morewrites | ||
*.mw | ||
|
||
# mylatexformat | ||
*.fmt | ||
|
||
# nomencl | ||
*.nlo | ||
|
||
# sagetex | ||
*.sagetex.sage | ||
*.sagetex.py | ||
*.sagetex.scmd | ||
|
||
# sympy | ||
*.sout | ||
*.sympy | ||
sympy-plots-for-*.tex/ | ||
|
||
# pdfcomment | ||
*.upa | ||
*.upb | ||
|
||
#pythontex | ||
*.pytxcode | ||
pythontex-files-*/ | ||
|
||
# Texpad | ||
.texpadtmp | ||
|
||
# TikZ & PGF | ||
*.dpth | ||
*.md5 | ||
*.auxlock | ||
|
||
# todonotes | ||
*.tdo | ||
|
||
# xindy | ||
*.xdy | ||
|
||
# xypic precompiled matrices | ||
*.xyc | ||
|
||
# WinEdt | ||
*.bak | ||
*.sav | ||
|
||
# endfloat | ||
*.ttt | ||
*.fff | ||
|
||
# Latexian | ||
TSWLatexianTemp* | ||
|
||
main.pdf | ||
|
||
*.dropbox* | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
# Ranking Policy Gradient | ||
Ranking Policy Gradient (RPG) is a sample-efficienct policy gradient method | ||
that learns optimal ranking of actions with respect to the long term reward. | ||
This codebase contains the implementation of RPG using the | ||
[dopamine](https://github.com/google/dopamine) framework. | ||
|
||
|
||
## Instructions | ||
|
||
|
||
### Install via source | ||
#### Step 1. | ||
Follow the install [instruction](https://github.com/KaixiangLin/dopamine/blob/master/README.md#install-via-source) of | ||
dopamine framework for [Ubuntu](https://github.com/KaixiangLin/dopamine/blob/master/README.md#ubuntu) | ||
or [Max OS X](https://github.com/KaixiangLin/dopamine/blob/master/README.md#mac-os-x). | ||
|
||
#### Step 2. | ||
Download the RPG source, i.e. | ||
|
||
``` | ||
git clone [email protected]:illidanlab/rpg.git | ||
``` | ||
|
||
|
||
## Running the tests | ||
|
||
``` | ||
cd ./rpg/dopamine | ||
python -um dopamine.atari.train \ | ||
--agent_name=rpg \ | ||
--base_dir=/tmp/dopamine \ | ||
--random_seed 1 \ | ||
--game_name=Pong \ | ||
--gin_files='dopamine/agents/rpg/configs/rpg.gin' | ||
``` | ||
|
||
## Reproduce | ||
To reproduce the results in the paper, please refer to the instruction in [here](code.md). | ||
|
||
### Reference | ||
|
||
If you use this RPG implementation in your work, please consider citing the following papers: | ||
``` | ||
TODO(RPG): | ||
``` | ||
|
||
## Acknowledgments | ||
TODO(dopamine framework, fundings). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
# Overview | ||
|
||
This document explain the structure of this codebase and hyperparameters of experiments. | ||
|
||
|
||
## File organization | ||
|
||
### Step 1. | ||
Please refer to the instruction of dopamine structure in [here](https://github.com/KaixiangLin/dopamine/blob/master/docs/README.md#file-organization) | ||
|
||
### Step 2. | ||
We add variants of RPG agents in [this folder](dopamine/dopamine/agents) and we explain each agent as follows: | ||
|
||
|
||
| Folder | Exploration | Supervision | | ||
|---|---|---| | ||
| rpg | epsilon-greedy | RPG (Hinge loss) | | ||
| lpg | epsilon-greedy| LPG (Cross-Entropy) | | ||
| epg | EPG | LPG (Cross-Entropy) | | ||
|repg | EPG | RPG (Hinge loss) | | ||
|implicit_quantilerpg| implicit_quantile | RPG (Hinge loss) | | ||
|
||
|
||
* EPG: EPG is the stochastic listwise policy gradient | ||
with off-policy supervised learning, which is the vanilla policy gradient trained | ||
with off-policy supervised learning. The exploration and supervision agent is parameterized | ||
by the same neural network. The supervision agent minimizes the cross-entropy loss | ||
over the near-optimal trajectories collected in an online fashion. | ||
|
||
* LPG: LPG is the deterministic listwise policy gradient with off-policy supervised learning. | ||
We choose an action greedily based on the value of logits during the evaluation, and it stochastically | ||
explores the environment as EPG. | ||
|
||
* RPG: RPG explores the environment using a separate agent: epsilon-greedy, EPG in Pong and | ||
Implicit Quantile in other games. Then rpg conducts supervised | ||
learning by minimizing the hinge loss. | ||
|
||
In this codebase, the folder [rpg](dopamine/dopamine/agents/rpg) | ||
contain the code of RPG with epsilon-greedy exploration, and similarly [repg](dopamine/dopamine/agents/repg) for EPG exploration, | ||
[implicit_quantilerpg](dopamine/dopamine/agents/implicit_quantilerpg) | ||
for implicit quantile network exploration. | ||
|
||
The agents with relatively simple exploration strategy (rpg, lpg, epg, repg) perform well on Pong, | ||
comparing to the state-of-the-arts, since there are higher chance to hit the good trajectories with in Pong. | ||
For more complicated games, we adopt implicit quantile network as the exploration agent. | ||
|
||
## Hyperparameters | ||
The hyperparameters of networks, optimizers, etc., are same as the [baselines](https://github.com/KaixiangLin/dopamine/tree/master/baselines) in dopamine. | ||
The trajectory reward threshold c (see Def 5 in the paper (TODO)) for each game is given as follows: | ||
|
||
| game | c | | ||
|---|---| | ||
| Boxing | 100 | | ||
| Breakout | 400 | | ||
| Bowling | 80 | | ||
| BankHeist | 1100 | | ||
| DoubleDunk | 18 | | ||
| Pitfall | 0 | | ||
| Pong | 1 | | ||
| Robotank| 65 | | ||
|
||
|
||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# coding=utf-8 | ||
# Copyright 2018 The Dopamine Authors. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
name = 'dopamine' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# coding=utf-8 | ||
# Copyright 2018 The Dopamine Authors. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
Oops, something went wrong.