Utility-based on-line exploration for repeated navigation in an embedded graph

Authors:
Shlomo Argamon-Engelson;Sarit Kraus;Sigalit Sina
Affiliations:
Department Mathematics and Computer Science, Bar-Ilan University, Ramat Gan 52900, Israel;Department Mathematics and Computer Science, Bar-Ilan University, Ramat Gan 52900, Israel and Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742, USA;Department Mathematics and Computer Science, Bar-Ilan University, Ramat Gan 52900, Israel
Venue:
Artificial Intelligence
Year:
1998

Citing 24
Cited 0

Dynamic programming: deterministic and stochastic models

Dynamic programming: deterministic and stochastic models
Inference of finite automata using homing sequences

STOC '89 Proceedings of the twenty-first annual ACM symposium on Theory of computing
Real-time heuristic search

Artificial Intelligence
Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

Proceedings of the seventh international conference (1990) on Machine learning
Shortest paths without a map

Theoretical Computer Science
Embedding decision-analytic control in a learning architecture

Artificial Intelligence - Special issue on knowledge representation
How to learn an unknown environment (extended abstract)

SFCS '91 Proceedings of the 32nd annual symposium on Foundations of computer science
Learning in embedded systems

Learning in embedded systems
Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time

Machine Learning
The trailblazer search: a new method for searching and capturing moving targets

AAAI'94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 2)
Piecemeal Learning of an Unknown Environment

Machine Learning - Special issue on COLT '93
Planning under time constraints in stochastic domains

Artificial Intelligence - Special volume on planning and scheduling
Inferring Finite Automata with Stochastic Output Functions and an Application to Map Learning

Machine Learning
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Moving-Target Search: A Real-Time Search for Changing Goals

IEEE Transactions on Pattern Analysis and Machine Intelligence
Dynamic Programming

Dynamic Programming
Coping with Uncertainty in Map Learning

Coping with Uncertainty in Map Learning
Exploring an unknown graph

SFCS '90 Proceedings of the 31st Annual Symposium on Foundations of Computer Science
An on-line algorithm for improving performance in navigation

SFCS '93 Proceedings of the 1993 IEEE 34th Annual Foundations of Computer Science
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Efficient goal-directed exploration

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Improving the learning efficiencies of realtime search

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Efficient decision-theoretic planning: techniques and empirical analysis

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Probabilistic exploration in planning while learning

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we address the tradeoff between exploration and exploitation for agents which need to learn more about the structure of their environment in order to perform more effectively. For example, a robot may need to learn the most efficient routes between important sites in its environment. We compare on-line and off-line exploration for a repeated task, where the agent is given some particular task to perform some number of times. Tasks are modeled as navigation on a graph embedded in the plane. This paper describes a utility-based on-line exploration algorithm for repeated tasks, which takes into account both the costs and potential benefits (over future task repetitions) of different exploratory actions. Exploration is performed in a greedy fashion, with the locally optimal exploratory action performed on each task repetition. We experimentally evaluated our utility-based on-line algorithm against a heuristic search algorithm for off-line exploration as well as a randomized on-line exploration algorithm. We found that for a single repeated task, utility-based on-line exploration consistently outperforms the alternatives, unless the number of task repetitions is very high. In addition, we extended the algorithms for the case of multiple repeated tasks, where the agent has a different randomly-chosen task to perform each time. Here too, we found that utility-based on-line exploration is often preferred.