Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes

Authors:
Sridhar Mahadevan;Mauro Maggioni
Affiliations:
-;-
Venue:
The Journal of Machine Learning Research
Year:
2007

Citing 0
Cited 26

An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning

Proceedings of the 25th international conference on Machine learning
Transfer of task representation in reinforcement learning using policy-based proto-value functions

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3
Regularization on Graphs with Function-adapted Diffusion Processes

The Journal of Machine Learning Research
Towards a meaningful MRA of traffic matrices

Proceedings of the 8th ACM SIGCOMM conference on Internet measurement
Basis Expansion in Natural Actor Critic Methods

Recent Advances in Reinforcement Learning
Learning Representation and Control in Markov Decision Processes: New Frontiers

Foundations and Trends® in Machine Learning
Optimal Online Learning Procedures for Model-Free Policy Evaluation

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Feature Selection for Value Function Approximation Using Bayesian Model Selection

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Fast spectral learning using Lanczos eigenspace projections

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Predictive projections

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Transfer Learning for Reinforcement Learning Domains: A Survey

The Journal of Machine Learning Research
Character animation in two-player adversarial games

ACM Transactions on Graphics (TOG)
Automatic induction of bellman-error features for probabilistic planning

Journal of Artificial Intelligence Research
Generalized TD Learning

The Journal of Machine Learning Research
Metric learning for reinforcement learning agents

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
On the relation of slow feature analysis and laplacian eigenmaps

Neural Computation
Abstraction and generalization in reinforcement learning: a summary and framework

ALA'09 Proceedings of the Second international conference on Adaptive and Learning Agents
Basis function discovery using spectral clustering and bisimulation metrics

ALA'11 Proceedings of the 11th international conference on Adaptive and Learning Agents
Stochastic enforced hill-climbing

Journal of Artificial Intelligence Research
The successor representation and temporal context

Neural Computation
An online kernel-based clustering approach for value function approximation

SETN'12 Proceedings of the 7th Hellenic conference on Artificial Intelligence: theories and applications
Incremental slow feature analysis: Adaptive low-complexity slow feature updating from high-dimensional input streams

Neural Computation
Policy iteration based on a learned transition model

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
A hierarchical representation policy iteration algorithm for reinforcement learning

IScIDE'12 Proceedings of the third Sino-foreign-interchange conference on Intelligent Science and Intelligent Data Engineering
Reinforcement learning algorithms with function approximation: Recent advances and applications

Information Sciences: an International Journal
Construction of approximation spaces for reinforcement learning

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by diagonalizing symmetric diffusion operators (ii) A specific instantiation of this approach where global basis functions called proto-value functions (PVFs) are formed using the eigenvectors of the graph Laplacian on an undirected graph formed from state transitions induced by the MDP (iii) A three-phased procedure called representation policy iteration comprising of a sample collection phase, a representation learning phase that constructs basis functions from samples, and a final parameter estimation phase that determines an (approximately) optimal policy within the (linear) subspace spanned by the (current) basis functions. (iv) A specific instantiation of the RPI framework using least-squares policy iteration (LSPI) as the parameter estimation method (v) Several strategies for scaling the proposed approach to large discrete and continuous state spaces, including the Nyström extension for out-of-sample interpolation of eigenfunctions, and the use of Kronecker sum factorization to construct compact eigenfunctions in product spaces such as factored MDPs (vi) Finally, a series of illustrative discrete and continuous control tasks, which both illustrate the concepts and provide a benchmark for evaluating the proposed approach. Many challenges remain to be addressed in scaling the proposed framework to large MDPs, and several elaboration of the proposed framework are briefly summarized at the end.