Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
KaixiangLin committed Jun 25, 2019
1 parent f216fdf commit 4751db4
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 6 deletions.
13 changes: 8 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Ranking Policy Gradient
Ranking Policy Gradient (RPG) is a sample-efficienct policy gradient method
that learns optimal ranking of actions with respect to the long term reward.
Ranking Policy Gradient (RPG) is a sample-efficienct policy gradient method
that learns optimal ranking of actions with respect to the long term reward.
This codebase contains the implementation of RPG using the
[dopamine](https://github.com/google/dopamine) framework.

Expand Down Expand Up @@ -41,8 +41,11 @@ To reproduce the results in the paper, please refer to the instruction in [here]

If you use this RPG implementation in your work, please consider citing the following papers:
```
TODO(RPG):
@article{lin2019ranking,
title={Ranking Policy Gradient},
author={Lin, Kaixiang and Zhou, Jiayu},
journal={arXiv preprint arXiv:1906.09674},
year={2019}
}
```

## Acknowledgments
TODO(dopamine framework, fundings).
2 changes: 1 addition & 1 deletion code.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ For more complicated games, we adopt implicit quantile network as the exploratio

## Hyperparameters
The hyperparameters of networks, optimizers, etc., are same as the [baselines](https://github.com/KaixiangLin/dopamine/tree/master/baselines) in dopamine.
The trajectory reward threshold c (see Def 5 in the paper (TODO)) for each game is given as follows:
The trajectory reward threshold c (see Def 5 in the [paper](https://arxiv.org/abs/1906.09674)) for each game is given as follows:

| game | c |
|---|---|
Expand Down

0 comments on commit 4751db4

Please sign in to comment.