Skip to content

Latest commit

 

History

History
17 lines (12 loc) · 1.39 KB

README.md

File metadata and controls

17 lines (12 loc) · 1.39 KB

Proximal Policy Optimization

We provide the following four multi-agent extensions to PPO following the Anakin architecture.

In all cases IPPO implies that it is an implementation following the independent learners MARL paradigm while MAPPO implies that the implementation follows the centralised training with decentralised execution paradigm by having a centralised critic during training. The ff or rec suffixes in the system names implies that the policy networks are MLPs or have a GRU memory module to help learning despite partial observability in the environment.

In addition to the Anakin-based implementations, we also include a Sebulba-based implementation of ff-IPPO which can be used on environments that are not written in JAX and adhere to the Gymnasium API.

Relevant papers: