Learning RL by implementing and analysing different RL methods from scratch.
Directory | Game | Number of agents | RL method |
---|---|---|---|
nim-dqn | Nim-21 | 2 | Deep Q-network |
nim-a2c | Nim-21 | 2 | Advantage Actor Critic |
matching-pennies-a2c | Matching Pennies | 2 | Advantage Actor Critic |
snake-a2c | Snake | 1 | Advantage Actor Critic |
snake-ppo | Snake | 1 | Proximal Policy Optimisation |
I'm also using this project to learn more about MLFlow. Some of the train scripts depend on an actively running tracking server. Please check MLFlow documentation on how to start a tracking server and set the MLFLOW_URI
environment variable to the correct tracking server URL.