Using learned policies in heuristic-search planning

Authors:
SungWook Yoon;Alan Fern;Robert Givan
Affiliations:
Computer Science & Engineering, Arizona State University, Tempe, AZ;Computer Science Department, Oregon State University, Corvallis, OR;Electrical & Computer Engineering, Purdue University, West Lafayette, IN
Venue:
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Year:
2007

Citing 9
Cited 5

Taxonomic syntax for first order inference

Journal of the ACM (JACM)
Learning action strategies for planning domains

Artificial Intelligence
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Learning Decision Lists

Machine Learning
Learning measures of progress for planning domains

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
The FF planning system: fast plan generation through heuristic search

Journal of Artificial Intelligence Research
Planning through stochastic local search and temporal action graphs in LPG

Journal of Artificial Intelligence Research
Limited discrepancy search

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Inductive policy selection for first-order MDPs

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence

Scaling up heuristic planning with relational decision trees

Journal of Artificial Intelligence Research
Automatic construction of efficient multiple battery usage policies

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Plan-based policies for efficient multiple battery load management

Journal of Artificial Intelligence Research
Learning policies for battery usage optimization in electric vehicles

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Learning policies for battery usage optimization in electric vehicles

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many current state-of-the-art planners rely on forward heuristic search. The success of such search typically depends on heuristic distance-to-the-goal estimates derived from the plangraph. Such estimates are effective in guiding search for many domains, but there remain many other domains where current heuristics are inadequate to guide forward search effectively. In some of these domains, it is possible to learn reactive policies from example plans that solve many problems. However, due to the inductive nature of these learning techniques, the policies are often faulty, and fail to achieve high success rates. In this work, we consider how to effectively integrate imperfect learned policies with imperfect heuristics in order to improve over each alone. We propose a simple approach that uses the policy to augment the states expanded during each search step. In particular, during each search node expansion, we add not only its neighbors, but all the nodes along the trajectory followed by the policy from the node until some horizon. Empirical results show that our proposed approach benefits both of the leveraged automated techniques, learning and heuristic search, outperforming the state-of-the-art in most benchmark planning domains.