Basis function discovery using spectral clustering and bisimulation metrics

  • Authors:
  • Gheorghe Comanici;Doina Precup

  • Affiliations:
  • McGill University Montreal, QC, Canada;McGill University Montreal, QC, Canada

  • Venue:
  • The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Markov Decision Processes (MDPs) are a powerful framework for modeling sequential decision making for intelligent agents acting in stochastic environments. One of the important challenges facing such agents in practical applications is finding a suitable way to represent the state space, so that a good way of behaving can be learned efficiently. In this paper, we focus on learning a good policy when function approximation must be used to represent the value function. In this case, states are mapped into feature vectors, and a set of parameters is learned, which allows us to approximate the value of any given state. Theoretically, the quality of the approximation that can be obtained depends on the set of features. In practice, the feature set affects not only the quality of the solution obtained, but also the speed of learning.