On the usefulness of opponent modeling: the Kuhn Poker case study

  • Authors:
  • Alessandro Lazaric;Mario Quaresimale;Marcello Restelli

  • Affiliations:
  • Politecnico di Milano, Milan, Italy;Politecnico di Milano, Milan, Italy;Politecnico di Milano, Milan, Italy

  • Venue:
  • Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The application of reinforcement learning algorithms to Partially Observable Stochastic Games (POSG) is challenging since each agent does not have access to the whole state information and, in case of concurrent learners, the environment has non-stationary dynamics. These problems could be partially overcome if the policies followed by the other agents were known, and, for this reason, many approaches try to estimate them through the so-called opponent modeling techniques. Although many researches have been devoted to the study of the accuracy of the estimation of opponents' policies, still little attention has been deserved to understand in which situations these model estimations can be actually useful to improve the agent's performance. This paper presents a preliminary study about the impact of using opponent modeling techniques to learn the solution of a POSG. Our main purpose is to provide a measure of the gain in performance that can be obtained by exploiting information about the policy of other agents, and how this gain is affected by the accuracy of the estimated models. Our analysis focus on a small two-agent POSG: the Kuhn Poker, a simplified version of classical poker. Three cases will be considered according to the agent knowledge about the opponent's policy: no knowledge, perfect knowledge, and imperfect knowledge. The aim is to identify which is the maximum error that can affect the model estimate without leading to a performance lower than that reachable without using opponent-modeling information. Finally, we will show how the results of this analysis can be used to improve the performance of a reinforcement-learning algorithm coped with a simple opponent modeling technique.