Name		Name	Last commit message	Last commit date
parent directory ..
figures		figures
README.md		README.md
agent.py		agent.py
agent_10x10.state		agent_10x10.state
network.py		network.py
rewards_10x10.state		rewards_10x10.state
snake.py		snake.py
train.ipynb		train.ipynb
visualise.ipynb		visualise.ipynb

README.md

Learning Snake using A2C

A2C adapted to learn Snake. We observe that the reward (#apples eaten) steadily goes up as more episodes pass.

Here's an illustration of a game:

We observe that the critic value goes down after each eaten apple, illustrating the snake's tendency to die right after eating an apple.

Training run without entropy regularisation

From the actor and critic loss over time, we see that the agent seems to reach a steady state where the actor loss is close to zero and the critic loss is significantly higher. This indicates that the critic is still struggling to accurately predict the value of states.

Entropy regularisation encourages exploration and can prevent premature convergence, which leads to the next experiment:

Training run with entropy regularisation

The actor loss is significantly higher, however the performance does not seem to improve over the model without entropy regularisation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

snake-a2c

snake-a2c

README.md

Learning Snake using A2C

Training run without entropy regularisation

Training run with entropy regularisation

Files

snake-a2c

Directory actions

More options

Directory actions

More options

Latest commit

History

snake-a2c

Folders and files

parent directory

README.md

Learning Snake using A2C

Training run without entropy regularisation

Training run with entropy regularisation