Learning representation and control in continuous Markov decision processes

Authors:
Sridhar Mahadevan;Mauro Maggioni;Kimberly Ferguson;Sarah Osentoski
Affiliations:
Department of Computer Science, University of Massachusetts, Amherst, MA;Program in Applied Mathematics, Department of Mathematics, Yale University, New Haven,CT;Department of Computer Science, University of Massachusetts, Amherst, MA;Department of Computer Science, University of Massachusetts, Amherst, MA
Venue:
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Year:
2006

Citing 9
Cited 7

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Spectral Partitioning with Indefinite Kernels Using the Nyström Extension

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part III
Fast Monte-Carlo Algorithms for finding low-rank approximations

FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
Least-squares policy iteration

The Journal of Machine Learning Research
Semi-Supervised Learning on Riemannian Manifolds

Machine Learning
Proto-value functions: developmental reinforcement learning

ICML '05 Proceedings of the 22nd international conference on Machine learning
Fast direct policy evaluation using multiscale analysis of Markov diffusion processes

ICML '06 Proceedings of the 23rd international conference on Machine learning
On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning

The Journal of Machine Learning Research

Constructing basis functions from directed graphs for value function approximation

Proceedings of the 24th international conference on Machine learning
Learning state-action basis functions for hierarchical MDPs

Proceedings of the 24th international conference on Machine learning
Learning Representation and Control in Markov Decision Processes: New Frontiers

Foundations and Trends® in Machine Learning
Compact spectral bases for value function approximation using Kronecker factorization

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Solving factored MDPs with hybrid state and action variables

Journal of Artificial Intelligence Research
An analysis of Laplacian methods for value function approximation in MDPs

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Character animation in two-player adversarial games

ACM Transactions on Graphics (TOG)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a novel framework for simultaneously learning representation and control in continuous Markov decision processes. Our approach builds on the framework of proto-value functions, in which the underlying representation or basis functions are automatically derived from a spectral analysis of the state space manifold. The proto-value functions correspond to the eigenfunctions of the graph Laplacian. We describe an approach to extend the eigenfunctions to novel states using the Nyström extension. A least-squares policy iteration method is used to learn the control policy, where the underlying subspace for approximating the value function is spanned by the learned proto-value functions. A detailed set of experiments is presented using classic benchmark tasks, including the inverted pendulum and the mountain car, showing the sensitivity in performance to various parameters, and including comparisons with a parametric radial basis function method.