Reinforcement learning algorithms based on mGA and EA with policy iterations

Authors:
Changming Yin;Liyun Li;Hanxing Wang
Affiliations:
College of Computer and Communicational Engineering, Changsha University of Science and Technology, Changsha, Hunan, China;College of Computer and Communicational Engineering, Changsha University of Science and Technology, Changsha, Hunan, China;College of Sciences, Shanghai University, Shanghai, China
Venue:
LSMS'07 Proceedings of the Life system modeling and simulation 2007 international conference on Bio-Inspired computational intelligence and applications
Year:
2007

Citing 5
Cited 0

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Comparison of Parallel Messy Genetic Algorithm Data Distribution Strategies

Proceedings of the 5th International Conference on Genetic Algorithms
The Linear Programming Approach to Approximate Dynamic Programming

Operations Research
Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees

ICML '05 Proceedings of the 22nd international conference on Machine learning
Efficient reinforcement learning using recursive least-squares methods

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We contribute two new algorithms in this paper called PImGA and PIrlEA respectively in which we construct populations online in each iteration. Every iteration process in these two algorithms does not like the normal EA and GA in which they employ the inefficient value iteration method in general, instead of, in this paper, we employ the efficient policy iteration as the computation method for searching optimal control actions or policies. Meanwhile,these algorithms also do not like general EA and GA for selection operator to get a optimal policy, instead of we make the Agent learning a good or elite policy from its parents population. The resulted policy will be as one of elements of the next population. Because this policy is obtained by taking optimal reinforcement learning algorithm and greedy policy, the new population always can be constructed by applying better policies than its parents, that is to say, the child or offspring will inherit parents' good or elite abilities. Intuitively, for a finite problem, the resulted population from simulation will accommodate the near optimal policies after a number of iterations. Our experiments show that the algorithms can work well.