Tentative Exploration on Reinforcement Learning Algorithms for Stochastic Rewards

  • Authors:
  • Luis Peña;Antonio Latorre;José-María Peña;Sascha Ossowski

  • Affiliations:
  • Artificial Intelligence Department, Universidad Rey Juan Carlos,;Computer Architecture Department, Universidad Politécnica de Madrid,;Computer Architecture Department, Universidad Politécnica de Madrid,;Artificial Intelligence Department, Universidad Rey Juan Carlos,

  • Venue:
  • HAIS '09 Proceedings of the 4th International Conference on Hybrid Artificial Intelligence Systems
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper addresses a way to generate mixed strategies using reinforcement learning algorithms in domains with stochastic rewards. A new algorithm, based on Q-learning model, called TERSQ is introduced. As a difference from other approaches for stochastic scenarios, TERSQ uses a global exploration rate for all the state/actions in the same run. This exploration rate is selected at the beginning of each round, using a probabilistic distribution, which is updated once the run is finished. In this paper we compare TERSQ with similar approaches that use probability distributions depending on state-action pairs. Two experimental scenarios have been considered. First one deals with the problem of learning the optimal way to combine several evolutionary algorithms used simultaneously by a hybrid approach. In the second one, the objective is to learn the best strategy for a set of competing agents in combat-based videogame.