Symbolic generalization for on-line planning

Authors:
Zhengzhu Feng;Eric A. Hansen;Shlomo Zilberstein
Affiliations:
Computer Science Department, University of Massachusetts, Amherst, MA;Department of Computer Science and Engineering, Mississippi State University, Mississippi, MS;Computer Science Department, University of Massachusetts, Amherst, MA
Venue:
UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Year:
2002

Citing 17
Cited 8

Graph-Based Algorithms for Boolean Function Manipulation

IEEE Transactions on Computers
Bisimulation through probabilistic testing

Information and Computation
Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time

Machine Learning
Abstraction and approximate decision-theoretic planning

Artificial Intelligence
Algebraic decision diagrams and their applications

ICCAD '93 Proceedings of the 1993 IEEE/ACM international conference on Computer-aided design
Generalized prioritized sweeping

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Learning Bayesian networks with local structure

Learning in graphical models
Mixed Memory Markov Models: Decomposing Complex Stochastic Processes as Mixtures of Simpler Ones

Machine Learning
LAO: a heuristic search algorithm that finds solutions with loops

Artificial Intelligence - Special issue on heuristic search in artificial intelligence
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Structured Prioritised Sweeping

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
An Efficient Algorithm for Minimizing Real-time Transition Systems

CAV '93 Proceedings of the 5th International Conference on Computer Aided Verification
Symbolic heuristic search for factored Markov decision processes

Eighteenth national conference on Artificial intelligence
Abstraction in Control Learning

Abstraction in Control Learning
Learning to act using real-time dynamic programming

Artificial Intelligence
Model minimization in Markov decision processes

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
SPUDD: stochastic planning using decision diagrams

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

A causal approach to hierarchical decomposition of factored MDPs

ICML '05 Proceedings of the 22nd international conference on Machine learning
Causal Graph Based Decomposition of Factored MDPs

The Journal of Machine Learning Research
Weighted A∗ search -- unifying view and application

Artificial Intelligence
Decision-theoretic planning with non-Markovian rewards

Journal of Artificial Intelligence Research
Topological value iteration algorithm for Markov decision processes

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Active learning of dynamic Bayesian networks in Markov decision processes

SARA'07 Proceedings of the 7th International conference on Abstraction, reformulation, and approximation
Symbolic bounded real-time dynamic programming

SBIA'10 Proceedings of the 20th Brazilian conference on Advances in artificial intelligence
Topological value iteration algorithms

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Symbolic representations have been used successfully in off-line planning algorithms for Markov decision processes. We show that they can also improve the performance of on-line planners. In addition to reducing computation time, symbolic generalization can reduce the amount of costly real-world interactions required for convergence. We introduce Symbolic Real-Time Dynamic Programming (or sRTDP), an extension of RTDP. After each step of on-line interaction with an environment, sRTDP uses symbolic model-checking techniques to generalizes its experience by updating a group of states rather than a single state. We examine two heuristic approaches to dynamic grouping of states and show that they accelerate the planning process significantly in terms of both CPU time and the number of steps of interaction with the environment.