The complexity of Markov decision processes
Mathematics of Operations Research
A survey of algorithmic methods for partially observed Markov decision processes
Annals of Operations Research
Memoryless policies: theoretical limitations and practical results
SAB94 Proceedings of the third international conference on Simulation of adaptive behavior : from animals to animats 3: from animals to animats 3
Adaptive Behavior
Finite-memory control of partially observable systems
Finite-memory control of partially observable systems
Classifier fitness based on accuracy
Evolutionary Computation
Solving POMDPs by searching in policy space
UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes
UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
Region-based approximations for planning in stochastic domains
UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
Complexity of finite-horizon Markov decision process problems
Journal of the ACM (JACM)
The Complexity of Decentralized Control of Markov Decision Processes
Mathematics of Operations Research
Nonapproximability results for partially observable Markov decision processes
Journal of Artificial Intelligence Research
Complexity of probabilistic planning under average rewards
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
Conformant plans and beyond: Principles and complexity
Artificial Intelligence
Hi-index | 0.00 |
We consider the problem of finding good finite-horizon policies for POMDPs under the expected reward metric. The policies considered are free finite-memory policies with limited memory; a policy is a mapping from the space of observation-memory pairs to the space of action-memory pairs (the policy updates the memory as it goes), and the number of possible memory states is a parameter of the input to the policy-finding algorithms. The algorithms considered here are preliminary implementations of three search heuristics: local search. simulated annealing, and genetic algorithms. We compare their outcomes to each other and to the optimal policies for each instance. We compare run times of each policy and of a dynamic programming algorithm for POMDPs developed by Hansen that iteratively improves a finite-state controller - the previous state of the art for finite memory policies. The value of the best policy can only improve as the amount of memory increases, up to the amount needed for an optimal finite-memory policy. Our most surprising finding is that more memory helps in another way: given more memory than is needed for an optimal policy, the algorithms are more likely to converge to optimal-valued policies.