Automatic generation of an agent's basic behaviors

Authors:
Olivier Buffet;Alain Dutech;François Charpillet
Affiliations:
LORIA, BP 239, Vandœuvre-lès-Nancy;LORIA, BP 239, Vandœuvre-lès-Nancy;LORIA, BP 239, Vandœuvre-lès-Nancy
Venue:
AAMAS '03 Proceedings of the second international joint conference on Autonomous agents and multiagent systems
Year:
2003

Citing 6
Cited 0

Artificial intelligence: a modern approach

Artificial intelligence: a modern approach
Hierarchical Optimization of Policy-Coupled Semi-Markov Decision Processes

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Learning to Cooperate via Policy Search

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Asynchronous learning by emotions and cognition

ICSAB Proceedings of the seventh international conference on simulation of adaptive behavior on From animals to animats
Between MOPs and Semi-MOP: Learning, Planning & Representing Knowledge at Multiple Temporal Scales

Between MOPs and Semi-MOP: Learning, Planning & Representing Knowledge at Multiple Temporal Scales
Infinite-horizon policy-gradient estimation

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

The agent approach, as seen by [9], intends to design "intelligent" behaviors. Yet, Reinforcement Learning (RL) methods often fail when confronted with complex tasks. We are therefore trying to develop a methodology for the automated design of agents (in the framework of Markov Decision Processes) in the case where the global task can be decomposed into simpler -possibly concurrent- sub-tasks. Our main idea is to automatically combine basic behaviors using RL methods. This led us to propose two complementary mechanisms presented in the current paper. The first mechanism builds a global policy using a weighted combination of basic policies (which are reusable), the weights being learned by the agent (using Simulated Annealing in our case). An agent designed this way is highly scalable as, without further refinement of the global behavior, it can automatically combine several instances of the same basic behavior to take into account concurrent occurences of the same subtask. The second mechanism aims at creating new basic behaviors for combination. It is based on an incremental learning method that builds on the approximate solution obtained through the combination of older behaviors.