Approximating optimal policies for partially observable stochastic domains

Authors:
Ronald Parr;Stuart Russell
Affiliations:
Computer Science Division, University of California, Berkeley, CA;Computer Science Division, University of California, Berkeley, CA
Venue:
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Year:
1995

Citing 8
Cited 30

A survey of algorithmic methods for partially observed Markov decision processes

Annals of Operations Research
Artificial intelligence: a modern approach

Artificial intelligence: a modern approach
Acting optimally in partially observable stochastic domains

AAAI'94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 2)
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Dynamic Programming

Dynamic Programming
The Witness Algorithm: Solving Partially Observable Markov Decision Processes

The Witness Algorithm: Solving Partially Observable Markov Decision Processes
Memory Approaches to Reinforcement Learning in Non-Markovian Domains

Memory Approaches to Reinforcement Learning in Non-Markovian Domains
Adaptive Probabilistic Networks

Adaptive Probabilistic Networks

Learning agents for uncertain environments (extended abstract)

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Complexity of finite-horizon Markov decision process problems

Journal of the ACM (JACM)
Automatic Segmentation of Sequences through Hierarchical Reinforcement Learning

Sequence Learning - Paradigms, Algorithms, and Applications
Reinforcement learning for POMDPs based on action values and stochastic optimization

Eighteenth national conference on Artificial intelligence
A POMDP formulation of preference elicitation problems

Eighteenth national conference on Artificial intelligence
Spoken dialogue management using probabilistic reasoning

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Heuristic anytime approaches to stochastic decision processes

Journal of Heuristics
Dynamics based control with PSRs

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1
Compact, convex upper bound iteration for approximate POMDP planning

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Value-function approximations for partially observable Markov decision processes

Journal of Artificial Intelligence Research
Speeding up the convergence of value iteration in partially observable Markov decision processes

Journal of Artificial Intelligence Research
Restricted value iteration: theory and algorithms

Journal of Artificial Intelligence Research
A formal framework for speedup learning from problems and solutions

Journal of Artificial Intelligence Research
A model approximation scheme for planning in partially observable stochastic domains

Journal of Artificial Intelligence Research
Rationality and intelligence

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
A planning algorithm for predictive state representations

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Reinforcement learning in POMDPs without resets

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Probabilistic robot navigation in partially observable environments

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
A POMDP approximation algorithm that anticipates the need to observe

PRICAI'00 Proceedings of the 6th Pacific Rim international conference on Artificial intelligence
Q-learning with linear function approximation

COLT'07 Proceedings of the 20th annual conference on Learning theory
An overview of planning under uncertainty

Artificial intelligence today
Computing optimal policies for partially observable decision processes using compact representations

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
A heuristic variable grid solution method for POMDPs

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Incremental methods for computing bounds in partially observable Markov decision processes

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Dynamic traffic splitting to parallel wireless networks with partial information: A Bayesian approach

Performance Evaluation
A possibilistic model for qualitative sequential decision problems under uncertainty in partially observable environments

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
Region-based approximations for planning in stochastic domains

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
Some experiments with real-time decision algorithms

UAI'96 Proceedings of the Twelfth international conference on Uncertainty in artificial intelligence
A simple index rule for efficient traffic splitting over parallel wireless networks with partial information

Performance Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of making optimal decisions in uncertain conditions is central to Artificial Intelligence If the state of the world is known at all times, the world can be modeled as a Markov Decision Process (MDP) MDPs have been studied extensively and many methods are known for determining optimal courses of action or policies. The more realistic case where state information is only partially observable Partially Observable Markov Decision Processes (POMDPs) have received much less attention. The best exact algorithms for these problems can be very inefficient in both space and time. We introduce Smooth Partially Observable Value Approximation (SPOVA), a new approximation method that can quickly yield good approximations which can improve over time. This method can be combined with reinforcement learning meth ods a combination that was very effective in our test cases.