Ensemble methods for reinforcement learning with function approximation

Authors:
Stefan Faußer;Friedhelm Schwenker
Affiliations:
Institute of Neural Information Processing, University of Ulm, Ulm, Germany;Institute of Neural Information Processing, University of Ulm, Ulm, Germany
Venue:
MCS'11 Proceedings of the 10th international conference on Multiple classifier systems
Year:
2011

Citing 10
Cited 1

The Strength of Weak Learnability

Machine Learning
Bagging predictors

Machine Learning
Multi-agent reinforcement learning: weighting and partitioning

Neural Networks
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Collaborative Multiagent Reinforcement Learning by Payoff Propagation

The Journal of Machine Learning Research
Multiagent reinforcement learning and self-organization in a network of agents

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Multi-agent Reinforcement Learning Using Strategies and Voting

ICTAI '07 Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
Learning a Strategy with Neural Approximated Temporal-Difference Methods in English Draughts

ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
Ensemble Algorithms in Reinforcement Learning

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Emergence of social norms through collective learning in networked agent societies

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Ensemble methods allow to combine multiple models to increase the predictive performances but mostly utilize labelled data. In this paper we propose several ensemble methods to learn a combined parameterized state-value function of multiple agents. For this purpose the Temporal-Difference (TD) and Residual-Gradient (RG) update methods as well as a policy function is adapted to learn from joint decisions. Such joint decisions include Majority Voting and Averaging of the state-values. We apply these ensemble methods to the simple pencil-and-paper game Tic-Tac-Toe and show that an ensemble of three agents outperforms a single agent in terms of the Mean-Squared Error (MSE) to the true values as well as in terms of the resulting policy. Further we apply the same methods to learn the shortest path in a 20 × 20 maze and empirically show that the learning speed is faster and the resulting policy, i.e. the number of correctly choosen actions is better in an ensemble of multiple agents than that of a single agent.