Learning representation and control in continuous Markov decision processes

  • Authors:
  • Sridhar Mahadevan;Mauro Maggioni;Kimberly Ferguson;Sarah Osentoski

  • Affiliations:
  • Department of Computer Science, University of Massachusetts, Amherst, MA;Program in Applied Mathematics, Department of Mathematics, Yale University, New Haven,CT;Department of Computer Science, University of Massachusetts, Amherst, MA;Department of Computer Science, University of Massachusetts, Amherst, MA

  • Venue:
  • AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a novel framework for simultaneously learning representation and control in continuous Markov decision processes. Our approach builds on the framework of proto-value functions, in which the underlying representation or basis functions are automatically derived from a spectral analysis of the state space manifold. The proto-value functions correspond to the eigenfunctions of the graph Laplacian. We describe an approach to extend the eigenfunctions to novel states using the Nyström extension. A least-squares policy iteration method is used to learn the control policy, where the underlying subspace for approximating the value function is spanned by the learned proto-value functions. A detailed set of experiments is presented using classic benchmark tasks, including the inverted pendulum and the mountain car, showing the sensitivity in performance to various parameters, and including comparisons with a parametric radial basis function method.