Automatic programming of behavior-based robots using reinforcement learning

Authors:
Sridhar Mahadevan;Jonathan Connell
Affiliations:
IBM T.J. Watson Research Center, Yorktown Heights, NY;IBM T.J. Watson Research Center, Yorktown Heights, NY
Venue:
AAAI'91 Proceedings of the ninth National conference on Artificial intelligence - Volume 2
Year:
1991

Citing 4
Cited 5

Minimalist mobile robotics: a colony-style architecture for an artificial creature

Minimalist mobile robotics: a colony-style architecture for an artificial creature
Active perception and reinforcement learning

Proceedings of the seventh international conference (1990) on Machine learning
Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

Proceedings of the seventh international conference (1990) on Machine learning
Learning in embedded systems

Learning in embedded systems

A dynamical systems perspective on agent-environment interaction

Artificial Intelligence
Acquisition of Movement Pattern by Q-Learning in Peristaltic Crawling Robot

ICIRA '09 Proceedings of the 2nd International Conference on Intelligent Robotics and Applications
Learning the behavior model of a robot

Autonomous Robots
A combined reactive and reinforcement learning controller for an autonomous tracked vehicle

Robotics and Autonomous Systems
Vector-valued function estimation by grammatical evolution for autonomous robot control

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a general approach for automatically programming a behavior-based robot. New behaviors are learned by trial and error using a performance feedback function as reinforcement. Two algorithms for behavior learning are described that combine techniques for propagating reinforcement values temporally across actions and spatially across states. A behavior-based robot called OBELIX (see Figure 1) is described that learns several component behaviors in an example task involving pushing boxes. An experimental study using the robot suggests two conclusions. One, the learning techniques are able to learn the individual behaviors, sometimes outperforming a hand-coded program. Two, using a behavior-based architecture is better than using a monolithic architecture for learning the box pushing task.