Maximizing Reward in a Non-Stationary Mobile Robot Environment

Authors:
Dani Goldberg;Maja J. Matarić
Affiliations:
Robotics Institute, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213-3891 danig@cs.cmu.edu;Computer Science Department, University of Southern California, Los Angeles, CA 90089-0781 mataric@cs.usc.edu
Venue:
Autonomous Agents and Multi-Agent Systems
Year:
2003

Citing 0
Cited 6

A Model of Adaptation in Collaborative Multi-Agent Systems

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Adapting to non-uniform resource distributions in robotic swarm foraging through work-site relocation

IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
Adaptive navigation for autonomous robots

Robotics and Autonomous Systems
Agent fitness functions for evolving coordinated sensor networks

Proceedings of the 13th annual conference on Genetic and evolutionary computation
Learning from demonstration with swarm hierarchies

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Unlearning from demonstration

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The ability of a robot to improve its performance on a task can be critical, especially in poorly known and non-stationary environments where the best action or strategy is dependent upon the current state of the environment. In such systems, a good estimate of the current state of the environment is key to establishing high performance, however quantified. In this paper, we present an approach to state estimation in poorly known and non-stationary mobile robot environments, focusing on its application to a mine collection scenario, where performance is quantified using reward maximization. The approach is based on the use of augmented Markov models (AMMs), a sub-class of semi-Markov processes. We have developed an algorithm for incrementally constructing arbitrary-order AMMs on-line. It is used to capture the interaction dynamics between a robot and its environment in terms of behavior sequences executed during the performance of a task. For the purposes of reward maximization in a non-stationary environment, multiple AMMs monitor events at different timescales and provide statistics used to select the AMM likely to have a good estimate of the environmental state. AMMs with redundant or outdated information are discarded, while attempting to maintain sufficient data to reduce conformation to noise. This approach has been successfully implemented on a mobile robot performing a mine collection task. In the context of this task, we first present experimental results validating our reward maximization performance criterion. We then incorporate our algorithm for state estimation using multiple AMMs, allowing the robot to select appropriate actions based on the estimated state of the environment. The approach is tested first with a physical robot, in a non-stationary environment with an abrupt change, then with a simulation, in a gradually shifting environment.