Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
agent.py		agent.py
agent_10x10.state		agent_10x10.state
human_play.py		human_play.py
network.py		network.py
ppomemory.py		ppomemory.py
rewards_10x10.state		rewards_10x10.state
rewards_episodes.svg		rewards_episodes.svg
snake.py		snake.py
snake_game_visualisation.gif		snake_game_visualisation.gif
train.ipynb		train.ipynb
visualise.ipynb		visualise.ipynb

README.md

(WIP) Learning Snake using Proximal Policy Optimisation

Uses PPO to learn Snake. PPO builds upon TRPO (I haven't implemented this), but estimates the Kullback-Leibler divergence by clipping which makes PPO more compute efficient. This implementation contains better replay memory (see the PPOMemory class and compare against a simple deque) than the A2C implementation which allows for the implementation of Generalised Advantage Estimation (GAE). This is important as PPO performs less well on sparse gradients than other off-policy methods. The actor and critic are split, which allows us to implement clipping on the actor while keeping the critic unconstrained. We also use multiple epochs of updates on the same data for better sample efficiency.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

snake-ppo

snake-ppo

README.md

(WIP) Learning Snake using Proximal Policy Optimisation

Files

snake-ppo

Directory actions

More options

Directory actions

More options

Latest commit

History

snake-ppo

Folders and files

parent directory

README.md

(WIP) Learning Snake using Proximal Policy Optimisation