Constructing basis functions from directed graphs for value function approximation

Authors:
Jeff Johns;Sridhar Mahadevan
Affiliations:
Univ. of Massachusetts Amherst, Amherst, MA;Univ. of Massachusetts Amherst, Amherst, MA
Venue:
Proceedings of the 24th international conference on Machine learning
Year:
2007

Citing 7
Cited 6

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Least-squares policy iteration

The Journal of Machine Learning Research
Learning from labeled and unlabeled data on a directed graph

ICML '05 Proceedings of the 22nd international conference on Machine learning
Automatic basis function construction for approximate dynamic programming and reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Learning representation and control in continuous Markov decision processes

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Compact spectral bases for value function approximation using Kronecker factorization

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
An analysis of Laplacian methods for value function approximation in MDPs

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence

Learning state-action basis functions for hierarchical MDPs

Proceedings of the 24th international conference on Machine learning
Basis Expansion in Natural Actor Critic Methods

Recent Advances in Reinforcement Learning
Reinforcement Learning with Orthonormal Basis Adaptation Based on Activity-Oriented Index Allocation

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Learning Representation and Control in Markov Decision Processes: New Frontiers

Foundations and Trends® in Machine Learning
ℓ1-Penalized projected bellman residual

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Heat flow-thermodynamic depth complexity in directed networks

SSPR'12/SPR'12 Proceedings of the 2012 Joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Basis functions derived from an undirected graph connecting nearby samples from a Markov decision process (MDP) have proven useful for approximating value functions. The success of this technique is attributed to the smoothness of the basis functions with respect to the state space geometry. This paper explores the properties of bases created from directed graphs which are a more natural fit for expressing state connectivity. Digraphs capture the effect of non-reversible MDPs whose value functions may not be smooth across adjacent states. We provide an analysis using the Dirichlet sum of the directed graph Laplacian to show how the smoothness of the basis functions is affected by the graph's invariant distribution. Experiments in discrete and continuous MDPs with non-reversible actions demonstrate a significant improvement in the policies learned using directed graph bases.