Technical Note: \cal Q-Learning
Machine Learning
Incremental multi-step Q-learning
Machine Learning - Special issue on reinforcement learning
Machine Learning
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Reinforcement Learning for Stochastic Cooperative Multi-Agent Systems
AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 3
Two steps reinforcement learning
International Journal of Intelligent Systems
Hierarchical reinforcement learning with the MAXQ value function decomposition
Journal of Artificial Intelligence Research
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
Reinforcement learning with a hierarchy of abstract models
AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
Rapid, safe, and incremental learning of navigation strategies
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Hi-index | 0.00 |
When apply Q-learning to complex real-world problems, the learning process is long enough to make this method unpractical. The major cause is Q-learning requires the agent to visit every state-action transition infinitely often for making Q value convergent. We propose a State-Cluster based Q-learning method to accelerate convergence and shorten learning process. This method creates the State-Cluster for each state the agent reached according to the state trajectory that the agent wandered. By our algorithm, the State-Cluster of a state would hold these acyclic shortest state paths from other states to this state. When a state's Q value is refined in one step of the agent, the refined Q value can be propagated immediately back to all these states in its State-Cluster along the state paths between them, instead of requiring the agent to visit these states again. With the State-Cluster, more Q value can be refined in one step of the agent, which speeds up the convergence of Q value. The experiments compared with Q-learning demonstrate this method is extraordinarily more effective. This method is aimed Q-learning, but it is also applicable for most other reinforcement learning methods based value function iteration.