A new Q-learning algorithm based on the metropolis criterion

Authors:
Maozu Guo;Yang Liu;J. Malec
Affiliations:
Dept. of Comput. Sci. & Eng., Harbin Inst. of Technol., China;-;-
Venue:
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Year:
2004

Citing 0
Cited 11

Tuning continual exploration in reinforcement learning: An optimality property of the Boltzmann strategy

Neurocomputing
Meta-level Control of Multiagent Learning in Dynamic Repeated Resource Sharing Problems

PRICAI '08 Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Exploration and exploitation balance management in fuzzy reinforcement learning

Fuzzy Sets and Systems
Temporal difference learning and simulated annealing for optimal control: a case study

KES-AMSTA'08 Proceedings of the 2nd KES International conference on Agent and multi-agent systems: technologies and applications
A new Q-learning with generalized approximation spaces

ICNC'09 Proceedings of the 5th international conference on Natural computation
A Human-Robot Collaborative Reinforcement Learning Algorithm

Journal of Intelligent and Robotic Systems
Evaluating Q-learning policies for multi-objective foraging task in a multi-agent environment

ICIRA'10 Proceedings of the Third international conference on Intelligent robotics and applications - Volume Part II
A hybrid cognitive/reactive intelligent agent autonomous path planning technique in a networked-distributed unstructured environment for reinforcement learning

The Journal of Supercomputing
Exploration strategies for learning in multi-agent foraging

SEMCCO'11 Proceedings of the Second international conference on Swarm, Evolutionary, and Memetic Computing - Volume Part II
Towards a Multiple-Lookahead-Levels agent reinforcement-learning technique and its implementation in integrated circuits

The Journal of Supercomputing
Backward Q-learning: The combination of Sarsa algorithm and Q-learning

Engineering Applications of Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The balance between exploration and exploitation is one of the key problems of action selection in Q-learning. Pure exploitation causes the agent to reach the locally optimal policies quickly, whereas excessive exploration degrades the performance of the Q-learning algorithm even if it may accelerate the learning process and allow avoiding the locally optimal policies. In this paper, finding the optimum policy in Q-learning is described as search for the optimum solution in combinatorial optimization. The Metropolis criterion of simulated annealing algorithm is introduced in order to balance exploration and exploitation of Q-learning, and the modified Q-learning algorithm based on this criterion, SA-Q-learning, is presented. Experiments show that SA-Q-learning converges more quickly than Q-learning or Boltzmann exploration, and that the search does not suffer of performance degradation due to excessive exploration.