Markov Decision Processes with Observation Costs TITLE2:

  • Authors:
  • E. A. Hansen

  • Affiliations:
  • -

  • Venue:
  • Markov Decision Processes with Observation Costs TITLE2:
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

A partially observable Markov decision process (POMDP) is a generalization of a Markov decision process in which observation of the process state can be imperfect and/or costly. Although it provides an elegant model for control and planning problems that include information-gathering actions, the best current algorithms for POMDPs are computationally infeasible for all but small problems. One approach to this dilemma is to identify subsets of POMDPs that can be solved more efficiently than the general problem can be. This report describes a policy iteration algorithm that we prove converges to an optimal policy for any infinite-horizon POMDP for which it is optimal to acquire perfect information at finite intervals. For this subset of POMDPs, the value function can be represented by a value for each state of the Markov process -- the same representation used for completely observable MDPs -- and this simplification makes it possible to compute optimal policies efficiently for many problems in this class. The policy iteration algorithm we describe is a synthesis of ideas from dynamic programming and heuristic search.