A new polynomial-time algorithm for linear programming
Combinatorica
Parameter imprecision in finite state, finite action dynamic programs
Operations Research
A theory of higher order probabilities
Proceedings of the 1986 Conference on Theoretical aspects of reasoning about knowledge
The complexity of Markov decision processes
Mathematics of Operations Research
A survey of algorithmic methods for partially observed Markov decision processes
Annals of Operations Research
Planning and acting in partially observable stochastic domains
Artificial Intelligence
Bounded-parameter Markov decision process
Artificial Intelligence
Artificial Intelligence
Dynamic Programming and Optimal Control, Two Volume Set
Dynamic Programming and Optimal Control, Two Volume Set
Probability Intervals Over Influence Diagrams
IEEE Transactions on Pattern Analysis and Machine Intelligence
Scalable Internal-State Policy-Gradient Methods for POMDPs
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
UAI '89 Proceedings of the Fifth Annual Conference on Uncertainty in Artificial Intelligence
Second order probabilities for uncertain and conflicting evidence
UAI '90 Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence
Decision making with interval influence diagrams
UAI '90 Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence
An epsilon-Optimal Grid-Based Algorithm for Partially Observable Markov Decision Processes
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Experiences with a mobile robotic guide for the elderly
Eighteenth national conference on Artificial intelligence
Dynamic Programming
Tractable planning under uncertainty: exploiting structure
Tractable planning under uncertainty: exploiting structure
Exploiting structure to efficiently solve large scale partially observable markov decision processes
Exploiting structure to efficiently solve large scale partially observable markov decision processes
Robust Control of Markov Decision Processes with Uncertain Transition Matrices
Operations Research
Value-function approximations for partially observable Markov decision processes
Journal of Artificial Intelligence Research
Speeding up the convergence of value iteration in partially observable Markov decision processes
Journal of Artificial Intelligence Research
Nonapproximability results for partially observable Markov decision processes
Journal of Artificial Intelligence Research
Restricted value iteration: theory and algorithms
Journal of Artificial Intelligence Research
Point-based value iteration: an anytime algorithm for POMDPs
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
An improved grid-based approximation algorithm for POMDPs
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
Computing optimal policies for partially observable decision processes using compact representations
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
Approximate planning for factored POMDPs using belief state simplification
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Updating sets of probabilities
UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Solving POMDPs by searching in policy space
UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes
UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
Independence with lower and upper probabilities
UAI'96 Proceedings of the Twelfth international conference on Uncertainty in artificial intelligence
UAI'96 Proceedings of the Twelfth international conference on Uncertainty in artificial intelligence
Theoretical foundations for abstraction-based probabilistic planning
UAI'96 Proceedings of the Twelfth international conference on Uncertainty in artificial intelligence
An introduction to issues in higher order uncertainty
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
An Evidential Measure of Risk in Evidential Markov Chains
ECSQARU '09 Proceedings of the 10th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty
Imprecise markov chains and their limit behavior
Probability in the Engineering and Informational Sciences
Discrete time Markov chains with interval probabilities
International Journal of Approximate Reasoning
Sequential decision making with partially ordered preferences
Artificial Intelligence
Hi-index | 0.00 |
This study extends the framework of partially observable Markov decision processes (POMDPs) to allow their parameters, i.e., the probability values in the state transition functions and the observation functions, to be imprecisely specified. It is shown that this extension can reduce the computational costs associated with the solution of these problems. First, the new framework, POMDPs with imprecise parameters (POMDPIPs), is formulated. We consider (1) the interval case, in which each parameter is imprecisely specified by an interval that indicates possible values of the parameter, and (2) the point-set case, in which each probability distribution is imprecisely specified by a set of possible distributions. Second, a new optimality criterion for POMDPIPs is introduced. As in POMDPs, the criterion is to regard a policy, i.e., an action-selection rule, as optimal if it maximizes the expected total reward. The expected total reward, however, cannot be calculated precisely in POMDPIPs, because of the parameter imprecision. Instead, we estimate the total reward by adopting arbitrary second-order beliefs, i.e., beliefs in the imprecisely specified state transition functions and observation functions. Although there are many possible choices for these second-order beliefs, we regard a policy as optimal as long as there is at least one of such choices with which the policy maximizes the total reward. Thus there can be multiple optimal policies for a POMDPIP. We regard these policies as equally optimal, and aim at obtaining one of them. By appropriately choosing which second-order beliefs to use in estimating the total reward, computational costs incurred in obtaining such an optimal policy can be reduced significantly. We provide an exact solution algorithm for POMDPIPs that does this efficiently. Third, the performance of such an optimal policy, as well as the computational complexity of the algorithm, are analyzed theoretically. Last, empirical studies show that our algorithm quickly obtains satisfactory policies to many POMDPIPs.