This project aims to test/implement various algorithms that are related to Multi-agent RL to see whether these algorithm can lead the agent to more stable traing and/or desire behavior (Nash Equilibrium)
Right now we test most of the algorithms on Iterated Prisoner's Dilema to see whether tit-for-tat behavior arises from these kind of training.
Right now, I have planned to implement 3 algorithms
- Multiagent learning using a variable learning rate (
- Consensus Optimization from
- Learning with Opponent-Learning Awareness (