A hierarchical representation policy iteration algorithm for reinforcement learning

Authors:
Jian Wang;Lei Zuo;Jian Wang;Xin Xu;Chun Li
Affiliations:
College of Mechatronics and Automation, National University of Defense Tech., Changsha, P.R. China;College of Mechatronics and Automation, National University of Defense Tech., Changsha, P.R. China;Xi'an Air Force Military Representative Office, China;College of Mechatronics and Automation, National University of Defense Tech., Changsha, P.R. China;College of Mechatronics and Automation, National University of Defense Tech., Changsha, P.R. China
Venue:
IScIDE'12 Proceedings of the third Sino-foreign-interchange conference on Intelligent Science and Intelligent Data Engineering
Year:
2012

Citing 10
Cited 0

Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Artificial Intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Recent Advances in Hierarchical Reinforcement Learning

Discrete Event Dynamic Systems
State abstraction for programmable reinforcement learning agents

Eighteenth national conference on Artificial intelligence
Least-squares policy iteration

The Journal of Machine Learning Research
Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes

The Journal of Machine Learning Research
Hierarchical reinforcement learning with the MAXQ value function decomposition

Journal of Artificial Intelligence Research
Efficient reinforcement learning using recursive least-squares methods

Journal of Artificial Intelligence Research
Kernel-Based Least Squares Policy Iteration for Reinforcement Learning

IEEE Transactions on Neural Networks
Hierarchical Approximate Policy Iteration With Binary-Tree State Space Decomposition

IEEE Transactions on Neural Networks - Part 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a hierarchical representation policy iteration (HRPI) algorithm. It is based on the method of state space decomposition implemented by introducing a binary tree. Combining the RPI algorithm with the state space decomposition method, the HRPI algorithm is proposed. In HRPI, the state space is decomposed into multiple sub-spaces according to an approximate value function, then the local policies are estimated on each sub-space and finally the global near-optimal policy is obtained by combining these local policies. The simulation results indicate that the proposed method has better performance compared to the conventional RPI algorithm.