Stochastic systems: estimation, identification and adaptive control
Stochastic systems: estimation, identification and adaptive control
Some monotonicity results for partially observed Markov decision processes
Operations Research
Computationally feasible bounds for partially observed Markov decision processes
Operations Research
A tutorial on hidden Markov models and selected applications in speech recognition
Readings in speech recognition
Mathematics of Operations Research
Introduction to Stochastic Dynamic Programming: Probability and Mathematical
Introduction to Stochastic Dynamic Programming: Probability and Mathematical
Introduction to Stochastic Search and Optimization
Introduction to Stochastic Search and Optimization
Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes
UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
IEEE Transactions on Signal Processing
Optimal Threshold Policies for Multivariate Stopping-Time POMDPs
ECSQARU '09 Proceedings of the 10th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty
Optimal threshold policies for multivariate POMDPs in radar resource management
IEEE Transactions on Signal Processing
Hi-index | 0.00 |
This paper considers multiarmed bandit problems involving partially observed Markov decision processes (POMDPs). We show how the Gittins index for the optimal scheduling policy can be computed by a value iteration algorithm on each process, thereby considerably simplifying the computational cost. A suboptimal value iteration algorithm based on Lovejoy's approximation is presented. We then show that for the case of totally positive of order 2 (TP2) transition probability matrices and monotone likelihood ratio (MLR) ordered observation probabilities, the Gittins index is MLR increasing in the information state. Algorithms that exploit this structure are then presented.