On learning in agent-centered search

Authors:
Nathan R. Sturtevant;Vadim Bulitko;Yngvi Björnsson
Affiliations:
University of Alberta, Edmonton, Alberta, Canada;University of Alberta, Edmonton, Alberta, Canada;Reykjavik University, Reykjavik, Iceland
Venue:
Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Year:
2010

Citing 19
Cited 2

Depth-first iterative-deepening: an optimal admissible tree search

Artificial Intelligence
Real-time heuristic search

Artificial Intelligence
Do the right thing: studies in limited rationality

Do the right thing: studies in limited rationality
Agent-centered search

AI Magazine
An Admissible Heuristic Search Algorithm

ISMIS '93 Proceedings of the 7th International Symposium on Methodologies for Intelligent Systems
Speeding up the Convergence of Real-Time Search

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Controlling the learning process of real-time heuristic search

Artificial Intelligence
Dynamic Programming

Dynamic Programming
A Comparison of Fast Search Methods for Real-Time Situated Agents

AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 2
Real-time adaptive A*

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Comparing real-time and incremental heuristic search for real-time situated agents

Autonomous Agents and Multi-Agent Systems
Of robot ants and elephants

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Learning in real-time search: a unifying framework

Journal of Artificial Intelligence Research
Real-time heuristic search with a priority queue

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Moving target search

IJCAI'91 Proceedings of the 12th international joint conference on Artificial intelligence - Volume 1
LRTA*(k)

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
The focussed D* algorithm for real-time replanning

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
TBA*: time-bounded A*

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Hardness measures for gridworld benchmarks and performance analysis of real-time heuristic search algorithms

Journal of Heuristics

Learning where you are going and from whence you came: h- and g-cost learning in real-time heuristic search

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One
Avoiding and escaping depressions in real-time heuristic search

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Since the introduction of the LRTA* algorithm, real-time heuristic search algorithms have generally followed the same plan-act-learn cycle: an agent plans one or several actions based on locally available information, executes them and then updates (i.e., learns) its heuristic function. Algorithm evaluation has almost exclusively been empirical with the results often being domain-specific and incomparable across papers. Even when unification and cross-algorithm comparisons have been carried out in a single paper, there was no understanding of how efficient the learning process was with respect to a theoretical optimum. This paper addresses the problem with two primary contributions. First, we formally define a lower bound on the amount of learning any heuristic-learning algorithm needs to do. This bound is based on the notion of heuristic depressions and allows us to have a domain-independent measure of learning efficiency across different algorithms. Second, using this measure we propose to learn "costs-so-far" (g-costs) instead of "costs-to-go" (h-costs). This allows us to quickly identify redundant paths and dead-end states, thereby leading to asymptotic performance improvement as well as 1--2 orders of magnitude convergence speed-ups in practice.