Maximizing Reward in a Non-Stationary Mobile Robot Environment

  • Authors:
  • Dani Goldberg;Maja J. Matarić

  • Affiliations:
  • Robotics Institute, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213-3891 danig@cs.cmu.edu;Computer Science Department, University of Southern California, Los Angeles, CA 90089-0781 mataric@cs.usc.edu

  • Venue:
  • Autonomous Agents and Multi-Agent Systems
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

The ability of a robot to improve its performance on a task can be critical, especially in poorly known and non-stationary environments where the best action or strategy is dependent upon the current state of the environment. In such systems, a good estimate of the current state of the environment is key to establishing high performance, however quantified. In this paper, we present an approach to state estimation in poorly known and non-stationary mobile robot environments, focusing on its application to a mine collection scenario, where performance is quantified using reward maximization. The approach is based on the use of augmented Markov models (AMMs), a sub-class of semi-Markov processes. We have developed an algorithm for incrementally constructing arbitrary-order AMMs on-line. It is used to capture the interaction dynamics between a robot and its environment in terms of behavior sequences executed during the performance of a task. For the purposes of reward maximization in a non-stationary environment, multiple AMMs monitor events at different timescales and provide statistics used to select the AMM likely to have a good estimate of the environmental state. AMMs with redundant or outdated information are discarded, while attempting to maintain sufficient data to reduce conformation to noise. This approach has been successfully implemented on a mobile robot performing a mine collection task. In the context of this task, we first present experimental results validating our reward maximization performance criterion. We then incorporate our algorithm for state estimation using multiple AMMs, allowing the robot to select appropriate actions based on the estimated state of the environment. The approach is tested first with a physical robot, in a non-stationary environment with an abrupt change, then with a simulation, in a gradually shifting environment.