Online planning for large MDPs with MAXQ decomposition

Authors:
Aijun Bai;Feng Wu;Xiaoping Chen
Affiliations:
Univ. of Sci. & Tech. of China, Hefei, Anhui, China;Univ. of Sci. & Tech. of China, Hefei, Anhui, China;Univ. of Sci. & Tech. of China, Hefei, Anhui, China
Venue:
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Year:
2012

Citing 2
Cited 0

Hierarchical reinforcement learning with the MAXQ value function decomposition

Journal of Artificial Intelligence Research
DetH: approximate hierarchical solution of large Markov decision processes

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three

Quantified Score

Hi-index	0.00

Visualization

Abstract

Markov decision processes (MDPs) provide an expressive framework for planning in stochastic domains. However, exactly solving a large MDP is often intractable due to the curse of dimensionality. Online algorithms help overcome the high computational complexity by avoiding computing a policy for each possible state. Hierarchical decomposition is another promising way to help scale MDP algorithms up to large domains by exploiting their underlying structure. In this paper, we present an effort on combining the benefits of a general hierarchical structure based on MAXQ value function decomposition with the power of heuristic and approximate techniques for developing an online planning framework, called MAXQ-OP. The proposed framework provides a principled approach for programming autonomous agents in a large stochastic domain. We have been conducting a long-term case-study with the RoboCup soccer simulation 2D domain, which is extremely larger than domains usually studied in literature, as the major benchmark to this research. The case-study showed that the agents developed with this framework and the related techniques reached outstanding performances, showing its high scalability to very large domains.