update readme

illidanlab · Jun 25, 2019 · 4751db4 · 4751db4
1 parent f216fdf
commit 4751db4
Show file tree

Hide file tree

Showing 2 changed files with 9 additions and 6 deletions.
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # Ranking Policy Gradient
-Ranking Policy Gradient (RPG) is a sample-efficienct  policy gradient method
-that learns optimal ranking of actions with respect to the  long term reward.
+Ranking Policy Gradient (RPG) is a sample-efficienct policy gradient method
+that learns optimal ranking of actions with respect to the long term reward.
 This codebase contains the implementation of RPG using the
 [dopamine](https://github.com/google/dopamine) framework. 
 
@@ -41,8 +41,11 @@ To reproduce the results in the paper, please refer to the instruction in [here]
 
 If you use this RPG implementation in your work, please consider citing the following papers:
 ```
-TODO(RPG): 
+@article{lin2019ranking,
+  title={Ranking Policy Gradient},
+  author={Lin, Kaixiang and Zhou, Jiayu},
+  journal={arXiv preprint arXiv:1906.09674},
+  year={2019}
+}
 ```
 
-## Acknowledgments
-TODO(dopamine framework, fundings). 
diff --git a/code.md b/code.md
@@ -46,7 +46,7 @@ For more complicated games, we adopt implicit quantile network as the exploratio
 
 ## Hyperparameters
 The hyperparameters of networks, optimizers, etc., are same as the [baselines](https://github.com/KaixiangLin/dopamine/tree/master/baselines) in dopamine. 
-The trajectory reward threshold c (see Def 5 in the paper (TODO)) for each game is given as follows:
+The trajectory reward threshold c (see Def 5 in the [paper](https://arxiv.org/abs/1906.09674)) for each game is given as follows:
 
 | game  | c  |
 |---|---|