Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning
Artificial Intelligence
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Kernel-Based Reinforcement Learning
Machine Learning
Recent Advances in Hierarchical Reinforcement Learning
Discrete Event Dynamic Systems
Near-Optimal Reinforcement Learning in Polynominal Time
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
R-max - a general polynomial time algorithm for near-optimal reinforcement learning
The Journal of Machine Learning Research
Least-squares policy iteration
The Journal of Machine Learning Research
Cover trees for nearest neighbor
ICML '06 Proceedings of the 23rd international conference on Machine learning
Hierarchical model-based reinforcement learning: R-max + MAXQ
Proceedings of the 25th international conference on Machine learning
Hierarchical reinforcement learning with the MAXQ value function decomposition
Journal of Artificial Intelligence Research
SMDP homomorphisms: an algebraic approach to abstraction in semi-Markov decision processes
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Model-based exploration in continuous state spaces
SARA'07 Proceedings of the 7th International conference on Abstraction, reformulation, and approximation
ECML'05 Proceedings of the 16th European conference on Machine Learning
On-Line model-based continuous state reinforcement learning using background knowledge
AI'12 Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence
Automatic interface optimization through random exploration of available elements
Proceedings of the 2nd Workshop on Machine Learning for Interactive Systems: Bridging the Gap Between Perception, Action and Communication
Hi-index | 0.00 |
Innovations such as optimistic exploration, function approximation, and hierarchical decomposition have helped scale reinforcement learning to more complex environments, but these three ideas have rarely been studied together. This paper develops a unified framework that formalizes these algorithmic contributions as operators on learned models of the environment. Our formalism reveals some synergies among these innovations, and it suggests a straightforward way to compose them. The resulting algorithm, Fitted R-MAXQ, is the first to combine the function approximation of fitted algorithms, the efficient model-based exploration of R-MAX, and the hierarchical decompostion of MAXQ.