Skip to content

Commit

Permalink
first commit.
Browse files Browse the repository at this point in the history
  • Loading branch information
KaixiangLin committed Jun 25, 2019
0 parents commit f216fdf
Show file tree
Hide file tree
Showing 61 changed files with 9,623 additions and 0 deletions.
175 changes: 175 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
## Core latex/pdflatex auxiliary files:
*.aux
*.lof
*.log
*.lot
*.fls
*.out
*.toc
*.fmt
.DS_Store
*/temp/*
*.pyc
*./.idea/*
.idea/*
*.DS_Store*
*.ipynb_checkpoints/*
notebooks/.ipynb_checkpoints/*
*.dropbox*
*Icon*
*/__pycache__/*
*/.ipynb_checkpoints/*
## Intermediate documents:
*.dvi
*-converted-to.*
# these rules might exclude image files for figures etc.
# *.ps
# *.eps
# *.pdf

## Bibliography auxiliary files (bibtex/biblatex/biber):
*.bbl
*.bcf
*.blg
*-blx.aux
*-blx.bib
*.brf
*.run.xml

## Build tool auxiliary files:
*.fdb_latexmk
*.synctex
*.synctex.gz
*.synctex.gz(busy)
*.pdfsync

## Auxiliary and intermediate files from other packages:
# algorithms
*.alg
*.loa

# achemso
acs-*.bib

# amsthm
*.thm

# beamer
*.nav
*.snm
*.vrb

# cprotect
*.cpt

#(e)ledmac/(e)ledpar
*.end
*.[1-9]
*.[1-9][0-9]
*.[1-9][0-9][0-9]
*.[1-9]R
*.[1-9][0-9]R
*.[1-9][0-9][0-9]R
*.eledsec[1-9]
*.eledsec[1-9]R
*.eledsec[1-9][0-9]
*.eledsec[1-9][0-9]R
*.eledsec[1-9][0-9][0-9]
*.eledsec[1-9][0-9][0-9]R

# glossaries
*.acn
*.acr
*.glg
*.glo
*.gls

# gnuplottex
*-gnuplottex-*

# hyperref
*.brf

# knitr
*-concordance.tex
*.tikz
*-tikzDictionary

# listings
*.lol

# makeidx
*.idx
*.ilg
*.ind
*.ist

# minitoc
*.maf
*.mtc
*.mtc[0-9]
*.mtc[1-9][0-9]

# minted
_minted*
*.pyg
*.pyc
# morewrites
*.mw

# mylatexformat
*.fmt

# nomencl
*.nlo

# sagetex
*.sagetex.sage
*.sagetex.py
*.sagetex.scmd

# sympy
*.sout
*.sympy
sympy-plots-for-*.tex/

# pdfcomment
*.upa
*.upb

#pythontex
*.pytxcode
pythontex-files-*/

# Texpad
.texpadtmp

# TikZ & PGF
*.dpth
*.md5
*.auxlock

# todonotes
*.tdo

# xindy
*.xdy

# xypic precompiled matrices
*.xyc

# WinEdt
*.bak
*.sav

# endfloat
*.ttt
*.fff

# Latexian
TSWLatexianTemp*

main.pdf

*.dropbox*

48 changes: 48 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Ranking Policy Gradient
Ranking Policy Gradient (RPG) is a sample-efficienct policy gradient method
that learns optimal ranking of actions with respect to the long term reward.
This codebase contains the implementation of RPG using the
[dopamine](https://github.com/google/dopamine) framework.


## Instructions


### Install via source
#### Step 1.
Follow the install [instruction](https://github.com/KaixiangLin/dopamine/blob/master/README.md#install-via-source) of
dopamine framework for [Ubuntu](https://github.com/KaixiangLin/dopamine/blob/master/README.md#ubuntu)
or [Max OS X](https://github.com/KaixiangLin/dopamine/blob/master/README.md#mac-os-x).

#### Step 2.
Download the RPG source, i.e.

```
git clone [email protected]:illidanlab/rpg.git
```


## Running the tests

```
cd ./rpg/dopamine
python -um dopamine.atari.train \
--agent_name=rpg \
--base_dir=/tmp/dopamine \
--random_seed 1 \
--game_name=Pong \
--gin_files='dopamine/agents/rpg/configs/rpg.gin'
```

## Reproduce
To reproduce the results in the paper, please refer to the instruction in [here](code.md).

### Reference

If you use this RPG implementation in your work, please consider citing the following papers:
```
TODO(RPG):
```

## Acknowledgments
TODO(dopamine framework, fundings).
65 changes: 65 additions & 0 deletions code.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Overview

This document explain the structure of this codebase and hyperparameters of experiments.


## File organization

### Step 1.
Please refer to the instruction of dopamine structure in [here](https://github.com/KaixiangLin/dopamine/blob/master/docs/README.md#file-organization)

### Step 2.
We add variants of RPG agents in [this folder](dopamine/dopamine/agents) and we explain each agent as follows:


| Folder | Exploration | Supervision |
|---|---|---|
| rpg | epsilon-greedy | RPG (Hinge loss) |
| lpg | epsilon-greedy| LPG (Cross-Entropy) |
| epg | EPG | LPG (Cross-Entropy) |
|repg | EPG | RPG (Hinge loss) |
|implicit_quantilerpg| implicit_quantile | RPG (Hinge loss) |


* EPG: EPG is the stochastic listwise policy gradient
with off-policy supervised learning, which is the vanilla policy gradient trained
with off-policy supervised learning. The exploration and supervision agent is parameterized
by the same neural network. The supervision agent minimizes the cross-entropy loss
over the near-optimal trajectories collected in an online fashion.

* LPG: LPG is the deterministic listwise policy gradient with off-policy supervised learning.
We choose an action greedily based on the value of logits during the evaluation, and it stochastically
explores the environment as EPG.

* RPG: RPG explores the environment using a separate agent: epsilon-greedy, EPG in Pong and
Implicit Quantile in other games. Then rpg conducts supervised
learning by minimizing the hinge loss.

In this codebase, the folder [rpg](dopamine/dopamine/agents/rpg)
contain the code of RPG with epsilon-greedy exploration, and similarly [repg](dopamine/dopamine/agents/repg) for EPG exploration,
[implicit_quantilerpg](dopamine/dopamine/agents/implicit_quantilerpg)
for implicit quantile network exploration.

The agents with relatively simple exploration strategy (rpg, lpg, epg, repg) perform well on Pong,
comparing to the state-of-the-arts, since there are higher chance to hit the good trajectories with in Pong.
For more complicated games, we adopt implicit quantile network as the exploration agent.

## Hyperparameters
The hyperparameters of networks, optimizers, etc., are same as the [baselines](https://github.com/KaixiangLin/dopamine/tree/master/baselines) in dopamine.
The trajectory reward threshold c (see Def 5 in the paper (TODO)) for each game is given as follows:

| game | c |
|---|---|
| Boxing | 100 |
| Breakout | 400 |
| Bowling | 80 |
| BankHeist | 1100 |
| DoubleDunk | 18 |
| Pitfall | 0 |
| Pong | 1 |
| Robotank| 65 |





15 changes: 15 additions & 0 deletions dopamine/dopamine/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# coding=utf-8
# Copyright 2018 The Dopamine Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
name = 'dopamine'
15 changes: 15 additions & 0 deletions dopamine/dopamine/agents/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# coding=utf-8
# Copyright 2018 The Dopamine Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Loading

0 comments on commit f216fdf

Please sign in to comment.