Solving Large-Scale and Sparse-Reward DEC-POMDPs with Correlation-MDPs

Authors:
Feng Wu;Xiaoping Chen
Affiliations:
Multi-Agent Systems Lab,Department of Computer Science, University of Science and Technology of China, Hefei, China 230026;Multi-Agent Systems Lab,Department of Computer Science, University of Science and Technology of China, Hefei, China 230026
Venue:
RoboCup 2007: Robot Soccer World Cup XI
Year:
2008

Citing 5
Cited 2

The Complexity of Decentralized Control of Markov Decision Processes

Mathematics of Operations Research
The complexity of multiagent systems: the price of silence

AAMAS '03 Proceedings of the second international joint conference on Autonomous agents and multiagent systems
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 1
Bounded policy iteration for decentralized POMDPs

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Planning and acting in partially observable stochastic domains

Artificial Intelligence

Solving decentralized POMDP problems using genetic algorithms

Autonomous Agents and Multi-Agent Systems
WrightEagle and UT Austin villa: RoboCup 2011 simulation league champions

Robot Soccer World Cup XV

Quantified Score

Hi-index	0.00

Visualization

Abstract

Within a group of cooperating agents the decision making of an individual agent depends on the actions of the other agents. A lot of effort has been made to solve this problem with additional assumptions on the communication abilities of agents. However, in some realworld applications, communication is limited and the assumptions are rarely satisfied. An alternative approach newly developed is to employ a correlation device to correlate the agents' behavior without exchanging information during execution. In this paper, we apply correlation device to large-scale and spare-reward domains. As a basis we use the framework of infinite-horizon DEC-POMDPs which represent policies as joint stochastic finite-state controllers. To solve any problem of this kind, a correlation device is firstly calculated by solving Correlation Markov Decision Processes (Correlation-MDPs) and then used to improve the local controller for each agent. By using this method, we are able to achieve a tradeoff between computational complexity and the quality of the approximation. In addition, we demonstrate that, adversarial problems can be solved by encoding the information of opponents' behavior in the correlation device. We have successfully implemented the proposed method into our 2D simulated robot soccer team and the performance in RoboCup-2006 was encouraging.