Learning Representation and Control in Markov Decision Processes: New Frontiers

Authors:
Sridhar Mahadevan
Affiliations:
-
Venue:
Foundations and Trends® in Machine Learning
Year:
2009

Citing 71
Cited 0

A Theory for Multiresolution Signal Decomposition: The Wavelet Representation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Computational frameworks for the fast Fourier transform

Computational frameworks for the fast Fourier transform
Ten lectures on wavelets

Ten lectures on wavelets
Automatic programming of behavior-based robots using reinforcement learning

Artificial Intelligence
Generalised matrix inversion and rank computation by successive matrix powering

Parallel Computing
Sensitivity of the Stationary Distribution of a Markov Chain

SIAM Journal on Matrix Analysis and Applications
TD-Gammon, a self-teaching backgammon program, achieves master-level play

Neural Computation
Artificial intelligence: a modern approach

Artificial intelligence: a modern approach
Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

SIAM Journal on Scientific Computing
Successive matrix squaring algorithm for computing the Drazin inverse

Applied Mathematics and Computation
Robot Motion Planning

Robot Motion Planning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Neuro-Dynamic Programming

Neuro-Dynamic Programming
The Relations Among Potentials, Perturbation Analysis,and Markov Decision Processes

Discrete Event Dynamic Systems
Kernel-Based Reinforcement Learning

Machine Learning
Structure in the Space of Value Functions

Machine Learning
Recent Advances in Hierarchical Reinforcement Learning

Discrete Event Dynamic Systems
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Many-layered learning

Neural Computation
Spectral Partitioning with Indefinite Kernels Using the Nyström Extension

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part III
Least-Squares Temporal Difference Learning

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Using Temporal Neighborhoods to Adapt Function Approximators in Reinforcement Learning

IWANN '99 Proceedings of the International Work-Conference on Artificial and Natural Neural Networks: Foundations and Tools for Neural Modeling
Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning

Management Science
Greedy linear value-approximation for factored Markov decision processes

Eighteenth national conference on Artificial intelligence
Fast Monte-Carlo Algorithms for finding low-rank approximations

FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
Iterative Methods for Sparse Linear Systems

Iterative Methods for Sparse Linear Systems
Stable Function Approximation in Dynamic Programming

Stable Function Approximation in Dynamic Programming
A theory of justified reformulations

A theory of justified reformulations
Learning and value function approximation in complex decision processes

Learning and value function approximation in complex decision processes
Autonomous discovery of temporal abstractions from interaction with an environment

Autonomous discovery of temporal abstractions from interaction with an environment
Least-squares policy iteration

The Journal of Machine Learning Research
Convex Optimization

Convex Optimization
Semi-Supervised Learning on Riemannian Manifolds

Machine Learning
Dynamic abstraction in reinforcement learning via clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Reinforcement Learning with Factored States and Actions

The Journal of Machine Learning Research
Diffusion Kernels on Statistical Manifolds

The Journal of Machine Learning Research
Monte Carlo Statistical Methods (Springer Texts in Statistics)

Monte Carlo Statistical Methods (Springer Texts in Statistics)
Online learning over graphs

ICML '05 Proceedings of the 22nd international conference on Machine learning
Proto-value functions: developmental reinforcement learning

ICML '05 Proceedings of the 22nd international conference on Machine learning
Coarticulation: an approach for generating concurrent plans in Markov decision processes

ICML '05 Proceedings of the 22nd international conference on Machine learning
Updating Markov Chains with an Eye on Google's PageRank

SIAM Journal on Matrix Analysis and Applications
Fast direct policy evaluation using multiscale analysis of Markov diffusion processes

ICML '06 Proceedings of the 23rd international conference on Machine learning
On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning

The Journal of Machine Learning Research
Planning Algorithms

Planning Algorithms
Constructing basis functions from directed graphs for value function approximation

Proceedings of the 24th international conference on Machine learning
Learning state-action basis functions for hierarchical MDPs

Proceedings of the 24th international conference on Machine learning
Analyzing feature generation for value-function approximation

Proceedings of the 24th international conference on Machine learning
Graph Laplacians and their Convergence on Random Neighborhood Graphs

The Journal of Machine Learning Research
Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes

The Journal of Machine Learning Research
Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)

Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)
Improving generalization for temporal difference learning: The successor representation

Neural Computation
An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning

Proceedings of the 25th international conference on Machine learning
Hierarchical Average Reward Reinforcement Learning

The Journal of Machine Learning Research
Graph Laplacian based transfer learning in reinforcement learning

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3
Representation Discovery using Harmonic Analysis

Representation Discovery using Harmonic Analysis
A Wavelet Tour of Signal Processing, Third Edition: The Sparse Way

A Wavelet Tour of Signal Processing, Third Edition: The Sparse Way
Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction
Learning basis functions in hybrid domains

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Learning representation and control in continuous Markov decision processes

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Compact spectral bases for value function approximation using Kronecker factorization

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Fast spectral learning using Lanczos eigenspace projections

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Accelerating reinforcement learning by composing solutions of automatically identified subtasks

Journal of Artificial Intelligence Research
Efficient solution algorithms for factored MDPs

Journal of Artificial Intelligence Research
An analysis of Laplacian methods for value function approximation in MDPs

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
SMDP homomorphisms: an algebraic approach to abstraction in semi-Markov decision processes

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Covariant policy search

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
A reinforcement learning approach to job-shop scheduling

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Action-based representation discovery in markov decision processes

Action-based representation discovery in markov decision processes
SPUDD: stochastic planning using decision diagrams

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a novel machine learning framework for solving sequential decision problems called Markov decision processes (MDPs) by iteratively computing low-dimensional representations and approximately optimal policies. A unified mathematical framework for learning representation and optimal control in MDPs is presented based on a class of singular operators called Laplacians, whose matrix representations have nonpositive off-diagonal elements and zero row sums. Exact solutions of discounted and average-reward MDPs are expressed in terms of a generalized spectral inverse of the Laplacian called the Drazin inverse. A generic algorithm called representation policy iteration (RPI) is presented which interleaves computing low-dimensional representations and approximately optimal policies. Two approaches for dimensionality reduction of MDPs are described based on geometric and reward-sensitive regularization, whereby low-dimensional representations are formed by diagonalization or dilation of Laplacian operators. Model-based and model-free variants of the RPI algorithm are presented; they are also compared experimentally on discrete and continuous MDPs. Some directions for future work are finally outlined.