Taxonomic syntax for first order inference
Journal of the ACM (JACM)
Learning action strategies for planning domains
Artificial Intelligence
Neuro-Dynamic Programming
Machine Learning
Learning measures of progress for planning domains
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
The FF planning system: fast plan generation through heuristic search
Journal of Artificial Intelligence Research
Planning through stochastic local search and temporal action graphs in LPG
Journal of Artificial Intelligence Research
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Inductive policy selection for first-order MDPs
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Scaling up heuristic planning with relational decision trees
Journal of Artificial Intelligence Research
Automatic construction of efficient multiple battery usage policies
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Plan-based policies for efficient multiple battery load management
Journal of Artificial Intelligence Research
Learning policies for battery usage optimization in electric vehicles
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Hi-index | 0.00 |
Many current state-of-the-art planners rely on forward heuristic search. The success of such search typically depends on heuristic distance-to-the-goal estimates derived from the plangraph. Such estimates are effective in guiding search for many domains, but there remain many other domains where current heuristics are inadequate to guide forward search effectively. In some of these domains, it is possible to learn reactive policies from example plans that solve many problems. However, due to the inductive nature of these learning techniques, the policies are often faulty, and fail to achieve high success rates. In this work, we consider how to effectively integrate imperfect learned policies with imperfect heuristics in order to improve over each alone. We propose a simple approach that uses the policy to augment the states expanded during each search step. In particular, during each search node expansion, we add not only its neighbors, but all the nodes along the trajectory followed by the policy from the node until some horizon. Empirical results show that our proposed approach benefits both of the leveraged automated techniques, learning and heuristic search, outperforming the state-of-the-art in most benchmark planning domains.