Partially observable Markov decision processes with imprecise parameters

Authors:
Hideaki Itoh;Kiyohiko Nakamura
Affiliations:
Department of Computational Intelligence and Systems Science, Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology, 4259-G3-46 Nagatsuta-cho, Midori-ku, Yoko ...;Department of Computational Intelligence and Systems Science, Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology, 4259-G3-46 Nagatsuta-cho, Midori-ku, Yoko ...
Venue:
Artificial Intelligence
Year:
2007

Citing 35
Cited 5

A new polynomial-time algorithm for linear programming

Combinatorica
Parameter imprecision in finite state, finite action dynamic programs

Operations Research
A theory of higher order probabilities

Proceedings of the 1986 Conference on Theoretical aspects of reasoning about knowledge
The complexity of Markov decision processes

Mathematics of Operations Research
A survey of algorithmic methods for partially observed Markov decision processes

Annals of Operations Research
Planning and acting in partially observable stochastic domains

Artificial Intelligence
Bounded-parameter Markov decision process

Artificial Intelligence
Credal networks

Artificial Intelligence
Dynamic Programming and Optimal Control, Two Volume Set

Dynamic Programming and Optimal Control, Two Volume Set
Probability Intervals Over Influence Diagrams

IEEE Transactions on Pattern Analysis and Machine Intelligence
Scalable Internal-State Policy-Gradient Methods for POMDPs

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Interval Influence Diagrams

UAI '89 Proceedings of the Fifth Annual Conference on Uncertainty in Artificial Intelligence
Second order probabilities for uncertain and conflicting evidence

UAI '90 Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence
Decision making with interval influence diagrams

UAI '90 Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence
An epsilon-Optimal Grid-Based Algorithm for Partially Observable Markov Decision Processes

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Experiences with a mobile robotic guide for the elderly

Eighteenth national conference on Artificial intelligence
Dynamic Programming

Dynamic Programming
Tractable planning under uncertainty: exploiting structure

Tractable planning under uncertainty: exploiting structure
Exploiting structure to efficiently solve large scale partially observable markov decision processes

Exploiting structure to efficiently solve large scale partially observable markov decision processes
Robust Control of Markov Decision Processes with Uncertain Transition Matrices

Operations Research
Value-function approximations for partially observable Markov decision processes

Journal of Artificial Intelligence Research
Speeding up the convergence of value iteration in partially observable Markov decision processes

Journal of Artificial Intelligence Research
Nonapproximability results for partially observable Markov decision processes

Journal of Artificial Intelligence Research
Restricted value iteration: theory and algorithms

Journal of Artificial Intelligence Research
Point-based value iteration: an anytime algorithm for POMDPs

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
An improved grid-based approximation algorithm for POMDPs

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
Computing optimal policies for partially observable decision processes using compact representations

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
Approximate planning for factored POMDPs using belief state simplification

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Updating sets of probabilities

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Solving POMDPs by searching in policy space

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
Independence with lower and upper probabilities

UAI'96 Proceedings of the Twelfth international conference on Uncertainty in artificial intelligence
Quasi-Bayesian strategies for efficient plan generation: application to the planning to observe problem

UAI'96 Proceedings of the Twelfth international conference on Uncertainty in artificial intelligence
Theoretical foundations for abstraction-based probabilistic planning

UAI'96 Proceedings of the Twelfth international conference on Uncertainty in artificial intelligence
An introduction to issues in higher order uncertainty

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

An Evidential Measure of Risk in Evidential Markov Chains

ECSQARU '09 Proceedings of the 10th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty
Imprecise markov chains and their limit behavior

Probability in the Engineering and Informational Sciences
Discrete time Markov chains with interval probabilities

International Journal of Approximate Reasoning
Sequential decision making with partially ordered preferences

Artificial Intelligence
Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs

Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This study extends the framework of partially observable Markov decision processes (POMDPs) to allow their parameters, i.e., the probability values in the state transition functions and the observation functions, to be imprecisely specified. It is shown that this extension can reduce the computational costs associated with the solution of these problems. First, the new framework, POMDPs with imprecise parameters (POMDPIPs), is formulated. We consider (1) the interval case, in which each parameter is imprecisely specified by an interval that indicates possible values of the parameter, and (2) the point-set case, in which each probability distribution is imprecisely specified by a set of possible distributions. Second, a new optimality criterion for POMDPIPs is introduced. As in POMDPs, the criterion is to regard a policy, i.e., an action-selection rule, as optimal if it maximizes the expected total reward. The expected total reward, however, cannot be calculated precisely in POMDPIPs, because of the parameter imprecision. Instead, we estimate the total reward by adopting arbitrary second-order beliefs, i.e., beliefs in the imprecisely specified state transition functions and observation functions. Although there are many possible choices for these second-order beliefs, we regard a policy as optimal as long as there is at least one of such choices with which the policy maximizes the total reward. Thus there can be multiple optimal policies for a POMDPIP. We regard these policies as equally optimal, and aim at obtaining one of them. By appropriately choosing which second-order beliefs to use in estimating the total reward, computational costs incurred in obtaining such an optimal policy can be reduced significantly. We provide an exact solution algorithm for POMDPIPs that does this efficiently. Third, the performance of such an optimal policy, as well as the computational complexity of the algorithm, are analyzed theoretically. Last, empirical studies show that our algorithm quickly obtains satisfactory policies to many POMDPIPs.