Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Least-squares policy iteration
The Journal of Machine Learning Research
Learning from labeled and unlabeled data on a directed graph
ICML '05 Proceedings of the 22nd international conference on Machine learning
Automatic basis function construction for approximate dynamic programming and reinforcement learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
Learning representation and control in continuous Markov decision processes
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Compact spectral bases for value function approximation using Kronecker factorization
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
An analysis of Laplacian methods for value function approximation in MDPs
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Learning state-action basis functions for hierarchical MDPs
Proceedings of the 24th international conference on Machine learning
Basis Expansion in Natural Actor Critic Methods
Recent Advances in Reinforcement Learning
Reinforcement Learning with Orthonormal Basis Adaptation Based on Activity-Oriented Index Allocation
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Learning Representation and Control in Markov Decision Processes: New Frontiers
Foundations and Trends® in Machine Learning
ℓ1-Penalized projected bellman residual
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Heat flow-thermodynamic depth complexity in directed networks
SSPR'12/SPR'12 Proceedings of the 2012 Joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition
Hi-index | 0.00 |
Basis functions derived from an undirected graph connecting nearby samples from a Markov decision process (MDP) have proven useful for approximating value functions. The success of this technique is attributed to the smoothness of the basis functions with respect to the state space geometry. This paper explores the properties of bases created from directed graphs which are a more natural fit for expressing state connectivity. Digraphs capture the effect of non-reversible MDPs whose value functions may not be smooth across adjacent states. We provide an analysis using the Dirichlet sum of the directed graph Laplacian to show how the smoothness of the basis functions is affected by the graph's invariant distribution. Experiments in discrete and continuous MDPs with non-reversible actions demonstrate a significant improvement in the policies learned using directed graph bases.