Episodic task learning in Markov decision processes

Authors:
Yong Lin;Fillia Makedon;Yurong Xu
Affiliations:
Computer Science & Engineering, Arlington, USA;Computer Science & Engineering, Arlington, USA;Oracle Corporation, Redwood Shores, USA
Venue:
Artificial Intelligence Review
Year:
2011

Citing 8
Cited 0

Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Artificial Intelligence
Discovering Hierarchy in Reinforcement Learning with HEXQ

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
The MAXQ Method for Hierarchical Reinforcement Learning

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Hierarchical Knowledge-Based Process Planning in Manufacturing

Proceedings of the IFIP TC5 / WG5.2 & WG5.3 Eleventh International PROLAMAT Conference on Digital Enterprise - New Challenges: Life-Cycle Approach to Management and Production
Heuristic search value iteration for POMDPs

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
A causal approach to hierarchical decomposition of factored MDPs

ICML '05 Proceedings of the 22nd international conference on Machine learning
Hierarchical reinforcement learning with the MAXQ value function decomposition

Journal of Artificial Intelligence Research
Exploiting structure in policy construction

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hierarchical algorithms for Markov decision processes have been proved to be useful for the problem domains with multiple subtasks. Although the existing hierarchical approaches are strong in task decomposition, they are weak in task abstraction, which is more important for task analysis and modeling. In this paper, we propose a task-oriented design to strengthen the task abstraction. Our approach learns an episodic task model from the problem domain, with which the planner obtains the same control effect, with concise structure and much improved performance than the original model. According to our analysis and experimental evaluation, our approach has better performance than the existing hierarchical algorithms, such as MAXQ and HEXQ.