Stochastic enforced hill-climbing

Authors:
Jia-Hong Wu;Rajesh Kalyanam;Robert Givan
Affiliations:
Institute of Statistical Science, Academia Sinica, Taipei, Taiwan ROC;Electrical and Computer Engineering, Purdue University, W. Lafayette, IN;Electrical and Computer Engineering, Purdue University, W. Lafayette, IN
Venue:
Journal of Artificial Intelligence Research
Year:
2011

Citing 25
Cited 0

Principles of artificial intelligence

Principles of artificial intelligence
The cascade-correlation learning architecture

Advances in neural information processing systems 2
Planning as satisfiability

ECAI '92 Proceedings of the 10th European conference on Artificial intelligence
Planning under time constraints in stochastic domains

Artificial Intelligence - Special volume on planning and scheduling
Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment

Journal of the ACM (JACM)
LAO: a heuristic search algorithm that finds solutions with loops

Artificial Intelligence - Special issue on heuristic search in artificial intelligence
Dynamic Programming and Optimal Control

Dynamic Programming and Optimal Control
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Local Search in Combinatorial Optimization

Local Search in Combinatorial Optimization
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Planning as Propositional CSP: From Walksat to Local Search Techniques for Action Graphs

Constraints
Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes

The Journal of Machine Learning Research
Practical solution techniques for first-order MDPs

Artificial Intelligence
The FF planning system: fast plan generation through heuristic search

Journal of Artificial Intelligence Research
The metric-FF planning system: translating "Ignoring delete lists" to numeric state variables

Journal of Artificial Intelligence Research
Where "Ignoring delete lists" works: local search topology in planning benchmarks

Journal of Artificial Intelligence Research
The first probabilistic track of the international planning competition

Journal of Artificial Intelligence Research
mGPT: a probabilistic planner based on heuristic search

Journal of Artificial Intelligence Research
Probabilistic planning via heuristic forward search and weighted model counting

Journal of Artificial Intelligence Research
Conformant planning via heuristic forward search: A new approach

Artificial Intelligence
Learning to act using real-time dynamic programming

Artificial Intelligence
Incremental plan aggregation for generating policies in MDPs

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Automatic induction of bellman-error features for probabilistic planning

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Enforced hill-climbing is an effective deterministic hill-climbing technique that deals with local optima using breadth-first search (a process called "basin flooding"). We propose and evaluate a stochastic generalization of enforced hill-climbing for online use in goal-oriented probabilistic planning problems. We assume a provided heuristic function estimating expected cost to the goal with flaws such as local optima and plateaus that thwart straightforward greedy action choice. While breadth-first search is effective in exploring basins around local optima in deterministic problems, for stochastic problems we dynamically build and solve a heuristic-based Markov decision process (MDP) model of the basin in order to find a good escape policy exiting the local optimum. We note that building this model involves integrating the heuristic into the MDP problem because the local goal is to improve the heuristic. We evaluate our proposal in twenty-four recent probabilistic planning-competition benchmark domains and twelve probabilistically interesting problems from recent literature. For evaluation, we show that stochastic enforced hill-climbing (SEH) produces better policies than greedy heuristic following for value/cost functions derived in two very different ways: one type derived by using deterministic heuristics on a deterministic relaxation and a second type derived by automatic learning of Bellman-error features from domain-specific experience. Using the first type of heuristic, SEH is shown to generally outperform all planners from the first three international probabilistic planning competitions.