The purpose of this experiment was to check whether A2C would find the mixed Nash equilibrium in this game. High batch sizes do approach this equilibrium, but there appears to be bias in the estimation the payoffs are unbalanced (e.g. -1,1 payoff for no match and 2,-2 payoff for a match).
Files
matching-pennies-a2c
Folders and files
Name | Name | Last commit date | ||
---|---|---|---|---|
parent directory.. | ||||