Autonomous shaping via coevolutionary selection of training experience

Authors:
Marcin Szubert;Krzysztof Krawiec
Affiliations:
Institute of Computing Science, Poznan University of Technology, Poznań, Poland;Institute of Computing Science, Poznan University of Technology, Poznań, Poland
Venue:
PPSN'12 Proceedings of the 12th international conference on Parallel Problem Solving from Nature - Volume Part II
Year:
2012

Citing 8
Cited 0

Toward an Ideal Trainer

Machine Learning
Improving Generalization with Active Learning

Machine Learning - Special issue on structured connectionist systems
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Learning to Drive a Bicycle Using Reinforcement Learning and Shaping

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Autonomous shaping: knowledge transfer in reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Coevolutionary temporal difference learning for Othello

CIG'09 Proceedings of the 5th international conference on Computational Intelligence and Games
Formal analysis, hardness, and algorithms for extracting internal structure of test-based problems

Evolutionary Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

To acquire expert skills in a sequential decision making domain that is too vast to be explored thoroughly, an intelligent agent has to be capable of inducing crucial knowledge from the most representative parts of it. One way to shape the learning process and guide the learner in the right direction is effective selection of such parts that provide the best training experience. To realize this concept, we propose a shaping method that orchestrates the training by iteratively exposing the learner to subproblems generated autonomously from the original problem. The main novelty of the proposed approach consists in equalling the learning process with the search in subproblem space and in employing a coevolutionary algorithm to perform this search. Each individual in the population encodes a sequence of subproblems that is evaluated by confronting the learner trained on it with other learners shaped in this way by particular individuals. When applied to the game of Othello, temporal difference learning on the best found subproblem sequence yields substantially better players than learning on the entire problem at once.