Dynamic programming: deterministic and stochastic models
Dynamic programming: deterministic and stochastic models
Continual learning in reinforcement environments
Continual learning in reinforcement environments
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Least-squares policy iteration
The Journal of Machine Learning Research
Prioritization Methods for Accelerating MDP Solvers
The Journal of Machine Learning Research
TextGraphs-1 Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing
Sequential constant size compressors for reinforcement learning
AGI'11 Proceedings of the 4th international conference on Artificial general intelligence
On the complexity of solving Markov decision problems
UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Hi-index | 0.00 |
Future AGIs will need to solve large reinforcement-learning problems involving complex reward functions having multiple reward sources. One way to make progress on such problems is to decompose them into smaller regions that can be solved efficiently. We introduce a novel modular version of Least Squares Policy Iteration (LSPI), called M-LSPI, which 1. breaks up Markov decision problems (MDPs) into a set of mutually exclusive regions; 2. iteratively solves each region by a single matrix inversion and then combines the solutions by value iteration. The resulting algorithm leverages regional decomposition to efficiently solve the MDP. As the number of states increases, on both structured and unstructured MDPs, M-LSPI yields substantial improvements over traditional algorithms in terms of time to convergence to the value function of the optimal policy, especially as the discount factor approaches one.