Generalized best-first search strategies and the optimality of A*
Journal of the ACM (JACM)
Technical Note: \cal Q-Learning
Machine Learning
Planning and acting in partially observable stochastic domains
Artificial Intelligence
The Complexity of Decentralized Control of Markov Decision Processes
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Stochastic local search for POMDP controllers
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Value-function approximations for partially observable Markov decision processes
Journal of Artificial Intelligence Research
The communicative multiagent team decision problem: analyzing teamwork theories and models
Journal of Artificial Intelligence Research
Point-based value iteration: an anytime algorithm for POMDPs
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
An improved grid-based approximation algorithm for POMDPs
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
Computing optimal policies for partially observable decision processes using compact representations
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
Model based Bayesian exploration
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Hi-index | 0.00 |
Solving partially observable Markov decision processes (POMDPs) is a complex task that is often intractable. This paper examines the problem of finding an optimal policy for POMDPs. While a lot of effort has been made to develop algorithms to solve POMDPs, the question of automatically finding good low-dimensional spaces in multi-agent co-operative learning domains has not been explored thoroughly. To identify this question, an online algorithm CMEAS is presented to improve the POMDP model. This algorithm is based on a look-ahead search to find the best action to execute at each cycle. Thus the overwhelming complexity of computing a policy for each possible situation is avoided. A series of simulations demonstrate this good strategy and performance of the proposed algorithm when multiple agents co-operate to find an optimal policy for POMDPs.