The improvement of Q-learning applied to imperfect information game

Authors:
Jing Lin;Xuan Wang;Lijiao Han;Jiajia Zhang;Xinxin Xu
Affiliations:
Intelligence Computing Research Center, HIT Shenzhen Graduate School, Shenzhen, China;Intelligence Computing Research Center, HIT Shenzhen Graduate School, Shenzhen, China;School of Management, Shenyang University of Technology, Shenyang, China;Intelligence Computing Research Center, HIT Shenzhen Graduate School, Shenzhen, China;Intelligence Computing Research Center, HIT Shenzhen Graduate School, Shenzhen, China
Venue:
SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Year:
2009

Citing 6
Cited 0

Technical Note: \cal Q-Learning

Machine Learning
Nature's way of optimizing

Artificial Intelligence
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Truncating temporal differences: on the efficient implementation of TD (λ) for reinforcement learning

Journal of Artificial Intelligence Research
Analyze and guess type of piece in the computer game intelligent system

FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

There exist problems of slow convergence and local optimum in standard Q-learning algorithm. Truncated TD estimate returns efficiency and simulated annealing algorithm increase the chance of exploration. To accelerate the algorithm convergence speed and to avoid results in local optimum, this paper combines Q-learning algorithm, truncated TD estimation and simulated annealing algorithm. We apply improved Q-learning algorithm using into the imperfect information game (SiGuo military chess game), and realize a self-learning of imperfect information game system. Experimental outcomes show that this system can dynamically adjust each weight which describes game state according to the results. Further, it speeds up the process of learning, effectively simulates human intelligence and makes reasonable step, and significantly improves system performance.