My brain is full: when more memory helps

Authors:
Christopher Lusena;Tong Li;Shelia Sittinger;Chris Wells;Judy Goldsmith
Affiliations:
Computer Science Dept., University of Kentucky;Computer Science Dept., University of Kentucky;Computer Science Dept., University of Kentucky;Computer Science Dept., University of Kentucky;Computer Science Dept., University of Kentucky
Venue:
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Year:
1999

Citing 9
Cited 5

The complexity of Markov decision processes

Mathematics of Operations Research
A survey of algorithmic methods for partially observed Markov decision processes

Annals of Operations Research
Memoryless policies: theoretical limitations and practical results

SAB94 Proceedings of the third international conference on Simulation of adaptive behavior : from animals to animats 3: from animals to animats 3
HQ-learning

Adaptive Behavior
Finite-memory control of partially observable systems

Finite-memory control of partially observable systems
Classifier fitness based on accuracy

Evolutionary Computation
Solving POMDPs by searching in policy space

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
Region-based approximations for planning in stochastic domains

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence

Complexity of finite-horizon Markov decision process problems

Journal of the ACM (JACM)
The Complexity of Decentralized Control of Markov Decision Processes

Mathematics of Operations Research
Nonapproximability results for partially observable Markov decision processes

Journal of Artificial Intelligence Research
Complexity of probabilistic planning under average rewards

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
Conformant plans and beyond: Principles and complexity

Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of finding good finite-horizon policies for POMDPs under the expected reward metric. The policies considered are free finite-memory policies with limited memory; a policy is a mapping from the space of observation-memory pairs to the space of action-memory pairs (the policy updates the memory as it goes), and the number of possible memory states is a parameter of the input to the policy-finding algorithms. The algorithms considered here are preliminary implementations of three search heuristics: local search. simulated annealing, and genetic algorithms. We compare their outcomes to each other and to the optimal policies for each instance. We compare run times of each policy and of a dynamic programming algorithm for POMDPs developed by Hansen that iteratively improves a finite-state controller - the previous state of the art for finite memory policies. The value of the best policy can only improve as the amount of memory increases, up to the amount needed for an optimal finite-memory policy. Our most surprising finding is that more memory helps in another way: given more memory than is needed for an optimal policy, the algorithms are more likely to converge to optimal-valued policies.