Modular value iteration through regional decomposition

Authors:
Linus Gisslen;Mark Ring;Matthew Luciw;Jürgen Schmidhuber
Affiliations:
IDSIA, Manno-Lugano, Switzerland;IDSIA, Manno-Lugano, Switzerland;IDSIA, Manno-Lugano, Switzerland;IDSIA, Manno-Lugano, Switzerland
Venue:
AGI'12 Proceedings of the 5th international conference on Artificial General Intelligence
Year:
2012

Citing 9
Cited 0

Dynamic programming: deterministic and stochastic models

Dynamic programming: deterministic and stochastic models
Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time

Machine Learning
Continual learning in reinforcement environments

Continual learning in reinforcement environments
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Least-squares policy iteration

The Journal of Machine Learning Research
Prioritization Methods for Accelerating MDP Solvers

The Journal of Machine Learning Research
Chinese whispers: an efficient graph clustering algorithm and its application to natural language processing problems

TextGraphs-1 Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing
Sequential constant size compressors for reinforcement learning

AGI'11 Proceedings of the 4th international conference on Artificial general intelligence
On the complexity of solving Markov decision problems

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Future AGIs will need to solve large reinforcement-learning problems involving complex reward functions having multiple reward sources. One way to make progress on such problems is to decompose them into smaller regions that can be solved efficiently. We introduce a novel modular version of Least Squares Policy Iteration (LSPI), called M-LSPI, which 1. breaks up Markov decision problems (MDPs) into a set of mutually exclusive regions; 2. iteratively solves each region by a single matrix inversion and then combines the solutions by value iteration. The resulting algorithm leverages regional decomposition to efficiently solve the MDP. As the number of states increases, on both structured and unstructured MDPs, M-LSPI yields substantial improvements over traditional algorithms in terms of time to convergence to the value function of the optimal policy, especially as the discount factor approaches one.