Incremental methods for computing bounds in partially observable Markov decision processes

Authors:
Milos Hauskrecht
Affiliations:
MIT Laboratory for Computer Science, Cambridge, MA
Venue:
AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Year:
1997

Citing 6
Cited 14

Computationally feasible bounds for partially observed Markov decision processes

Operations Research
Sunoptimal policies, with bounds, for parameter adaptive decision processes

Operations Research
Dynamic Decision Making in Stochastic Partially Observable Domains: Ischemic Heart Disease Example

AIME '97 Proceedings of the 6th Conference on Artificial Intelligence in Medicine in Europe
Optimal Policies for Partially Observable Markov Decision Processes

Optimal Policies for Partially Observable Markov Decision Processes
Planning and control in stochastic domains with imperfect information

Planning and control in stochastic domains with imperfect information
Approximating optimal policies for partially observable stochastic domains

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

Alternative essences of intelligence

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Heuristic search value iteration for POMDPs

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Compact, convex upper bound iteration for approximate POMDP planning

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Speeding up the convergence of value iteration in partially observable Markov decision processes

Journal of Artificial Intelligence Research
Decision-theoretic bidding based on learned density models in simultaneous, interacting auctions

Journal of Artificial Intelligence Research
Restricted value iteration: theory and algorithms

Journal of Artificial Intelligence Research
Anytime point-based approximations for large POMDPs

Journal of Artificial Intelligence Research
A model approximation scheme for planning in partially observable stochastic domains

Journal of Artificial Intelligence Research
An improved grid-based approximation algorithm for POMDPs

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
A survey of collaborative filtering techniques

Advances in Artificial Intelligence
The cog project: building a humanoid robot

Computation for metaphors, analogy, and agents
A possibilistic model for qualitative sequential decision problems under uncertainty in partially observable environments

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Solving POMDPs by searching in policy space

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
A survey of point-based POMDP solvers

Autonomous Agents and Multi-Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Partially observable Markov decision processes (POMDPs) allow one to model complex dynamic decision or control problems that include both action outcome uncertainty and imperfect observability. The control problem is formulated as a dynamic optimization problem with a value function combining costs or rewards from multiple steps. In this paper we propose, analyse and test various incremental methods for computing bounds on the value function for control problems with infinite discounted horizon criteria. The methods described and tested include novel incremental versions of grid-based linear interpolation method and simple lower bound method with Sondik's updates. Both of these can work with arbitrary points of the belief space and can be enhanced by various heuristic point selection strategies. Also introduced is a new method for computing an initial upper bound - the fast informed bound method. This method is able to improve significantly on the standard and commonly used upper bound computed by the MDP-based method. The quality of resulting bounds are tested on a maze navigation problem with 20 states, 6 actions and 8 observations.