The Strength of Weak Learnability
Machine Learning
Machine Learning
Multi-agent reinforcement learning: weighting and partitioning
Neural Networks
Neural Networks for Pattern Recognition
Neural Networks for Pattern Recognition
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Collaborative Multiagent Reinforcement Learning by Payoff Propagation
The Journal of Machine Learning Research
Multiagent reinforcement learning and self-organization in a network of agents
Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Multi-agent Reinforcement Learning Using Strategies and Voting
ICTAI '07 Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
Learning a Strategy with Neural Approximated Temporal-Difference Methods in English Draughts
ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
Ensemble Algorithms in Reinforcement Learning
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Emergence of social norms through collective learning in networked agent societies
Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Hi-index | 0.00 |
Ensemble methods allow to combine multiple models to increase the predictive performances but mostly utilize labelled data. In this paper we propose several ensemble methods to learn a combined parameterized state-value function of multiple agents. For this purpose the Temporal-Difference (TD) and Residual-Gradient (RG) update methods as well as a policy function is adapted to learn from joint decisions. Such joint decisions include Majority Voting and Averaging of the state-values. We apply these ensemble methods to the simple pencil-and-paper game Tic-Tac-Toe and show that an ensemble of three agents outperforms a single agent in terms of the Mean-Squared Error (MSE) to the true values as well as in terms of the resulting policy. Further we apply the same methods to learn the shortest path in a 20 × 20 maze and empirically show that the learning speed is faster and the resulting policy, i.e. the number of correctly choosen actions is better in an ensemble of multiple agents than that of a single agent.