Evaluation of Batch-Mode Reinforcement Learning Methods for Solving DEC-MDPs with Changing Action Sets

Authors:
Thomas Gabel;Martin Riedmiller
Affiliations:
Neuroinformatics Group Department of Mathematics and Computer Science, University of Osnabrück, Osnabrück, Germany 49069;Neuroinformatics Group Department of Mathematics and Computer Science, University of Osnabrück, Osnabrück, Germany 49069
Venue:
Recent Advances in Reinforcement Learning
Year:
2008

Citing 15
Cited 0

Technical Note: \cal Q-Learning

Machine Learning
The Complexity of Decentralized Control of Markov Decision Processes

Mathematics of Operations Research
An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Optimizing information exchange in cooperative multi-agent systems

AAMAS '03 Proceedings of the second international joint conference on Autonomous agents and multiagent systems
Decentralized Markov Decision Processes with Event-Driven Interactions

AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 1
Coordination through Mutual Notification in Cooperative Multiagent Reinforcement Learning

AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 3
Coordinated exploration in multi-agent reinforcement learning: an application to load-balancing

Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Tree-Based Batch Mode Reinforcement Learning

The Journal of Machine Learning Research
Shaping multi-agent systems with gradient reinforcement learning

Autonomous Agents and Multi-Agent Systems
Reinforcement learning for DEC-MDPs with changing action sets and partially ordered dependencies

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3
Scheduling: Theory, Algorithms, and Systems

Scheduling: Theory, Algorithms, and Systems
Learning to Coordinate Efficiently: a model-based approach

Journal of Artificial Intelligence Research
Solving transition independent decentralized Markov decision processes

Journal of Artificial Intelligence Research
Sequential optimality and coordination in multiagent systems

IJCAI'99 Proceedings of the 16th international joint conference on Artifical intelligence - Volume 1
Neural fitted q iteration – first experiences with a data efficient neural reinforcement learning method

ECML'05 Proceedings of the 16th European conference on Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

DEC-MDPs with changing action sets and partially ordered transition dependencies have recently been suggested as a sub-class of general DEC-MDPs that features provably lower complexity. In this paper, we investigate the usability of a coordinated batch-mode reinforcement learning algorithm for this class of distributed problems. Our agents acquire their local policies independent of the other agents by repeated interaction with the DEC-MDP and concurrent evolvement of their policies, where the learning approach employed builds upon a specialized variant of a neural fitted Q iteration algorithm, enhanced for use in multi-agent settings. We applied our learning approach to various scheduling benchmark problems and obtained encouraging results that show that problems of current standards of difficulty can very well approximately, and in some cases optimally be solved.