Basis function discovery using spectral clustering and bisimulation metrics

Authors:
Gheorghe Comanici;Doina Precup
Affiliations:
School of Computer Science, McGill University, Montreal, QC, Canada;School of Computer Science, McGill University, Montreal, QC, Canada
Venue:
ALA'11 Proceedings of the 11th international conference on Adaptive and Learning Agents
Year:
2011

Citing 9
Cited 0

Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Metrics for finite Markov decision processes

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Proto-value functions: developmental reinforcement learning

ICML '05 Proceedings of the 22nd international conference on Machine learning
Automatic basis function construction for approximate dynamic programming and reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Analyzing feature generation for value-function approximation

Proceedings of the 24th international conference on Machine learning
Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes

The Journal of Machine Learning Research
An analysis of Laplacian methods for value function approximation in MDPs

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the problem of automatically generating features for function approximation in reinforcement learning. We build on the work of Mahadevan and his colleagues, who pioneered the use of spectral clustering methods for basis function construction. Their methods work on top of a graph that captures state adjacency. Instead, we use bisimulation metrics in order to provide state distances for spectral clustering. The advantage of these metrics is that they incorporate reward information in a natural way, in addition to the state transition information. We provide bisimulation metric bounds for general feature maps. This result suggests a new way of generating features, with strong theoretical guarantees on the quality of the obtained approximation. We also demonstrate empirically that the approximation quality improves when bisimulation metrics are used in the basis function construction process.