This project aims to test/implement various algorithms that are related to Multi-agent RL to see whether these algorithm can lead the agent to more stable traing and/or desire behavior (Nash Equilibrium)
Right now we test most of the algorithms on Iterated Prisoner's Dilema to see whether tit-for-tat behavior arises from these kind of training.
Right now, I have planned to implement 3 algorithms
- Multiagent learning using a variable learning rate (https://www.sciencedirect.com/science/article/pii/S0004370202001212)
- Consensus Optimization from https://arxiv.org/abs/1705.10461
- Learning with Opponent-Learning Awareness (https://arxiv.org/abs/1709.04326)