Expediting RL by using graphical structures

Authors:
Peng Dai;Alexander L. Strehl;Judy Goldsmith
Affiliations:
University of Washington, Seattle, WA;Yahoo! Research, New York;University of Kentucky, Lexington, KY
Venue:
Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3
Year:
2008

Citing 11
Cited 0

Technical Note: \cal Q-Learning

Machine Learning
Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Near-Optimal Reinforcement Learning in Polynomial Time

Machine Learning
Dynamic Programming

Dynamic Programming
Finite-memory control of partially observable systems

Finite-memory control of partially observable systems
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
Prioritization Methods for Accelerating MDP Solvers

The Journal of Machine Learning Research
A theoretical analysis of Model-Based Interval Estimation

ICML '05 Proceedings of the 22nd international conference on Machine learning
A dynamic topological sort algorithm for directed acyclic graphs

Journal of Experimental Algorithmics (JEA)
Topological value iteration algorithm for Markov decision processes

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The goal of Reinforcement learning (RL) is to maximize reward (minimize cost) in a Markov decision process (MDP) without knowing the underlying model a priori. RL algorithms tend to be much slower than planning algorithms, which require the model as input. Recent results demonstrate that MDP planning can be expedited, by exploiting the graphical structure of the MDP. We present extensions to two popular RL algorithms, Q-learning and RMax, that learn and exploit the graphical structure of problems to improve overall learning speed. Use of the graphical structure of the underlying MDP can greatly improve the speed of planning algorithms, if the underlying MDP has a nontrivial topological structure. Our experiments show that use of the apparent topological structure of an MDP speeds up reinforcement learning, even if the MDP is simply connected.