A state-cluster based Q-learning

Authors:
Zhao Jin;WeiYi Liu;Jian Jin
Affiliations:
School of Information Science and Engineering, Yunnan University, Kunming, P.R. China;School of Information Science and Engineering, Yunnan University, Kunming, P.R. China;Hongta Group Tobacco Limited Corporation, Yuxi, P.R. China
Venue:
ICNC'09 Proceedings of the 5th international conference on Natural computation
Year:
2009

Citing 10
Cited 0

Technical Note: \cal Q-Learning

Machine Learning
Incremental multi-step Q-learning

Machine Learning - Special issue on reinforcement learning
Machine Learning

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Reinforcement Learning for Stochastic Cooperative Multi-Agent Systems

AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 3
Two steps reinforcement learning

International Journal of Intelligent Systems
Hierarchical reinforcement learning with the MAXQ value function decomposition

Journal of Artificial Intelligence Research
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Reinforcement learning with a hierarchy of abstract models

AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
Rapid, safe, and incremental learning of navigation strategies

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Quantified Score

Hi-index	0.00

Visualization

Abstract

When apply Q-learning to complex real-world problems, the learning process is long enough to make this method unpractical. The major cause is Q-learning requires the agent to visit every state-action transition infinitely often for making Q value convergent. We propose a State-Cluster based Q-learning method to accelerate convergence and shorten learning process. This method creates the State-Cluster for each state the agent reached according to the state trajectory that the agent wandered. By our algorithm, the State-Cluster of a state would hold these acyclic shortest state paths from other states to this state. When a state's Q value is refined in one step of the agent, the refined Q value can be propagated immediately back to all these states in its State-Cluster along the state paths between them, instead of requiring the agent to visit these states again. With the State-Cluster, more Q value can be refined in one step of the agent, which speeds up the convergence of Q value. The experiments compared with Q-learning demonstrate this method is extraordinarily more effective. This method is aimed Q-learning, but it is also applicable for most other reinforcement learning methods based value function iteration.