Ensemble methods for reinforcement learning with function approximation

  • Authors:
  • Stefan Faußer;Friedhelm Schwenker

  • Affiliations:
  • Institute of Neural Information Processing, University of Ulm, Ulm, Germany;Institute of Neural Information Processing, University of Ulm, Ulm, Germany

  • Venue:
  • MCS'11 Proceedings of the 10th international conference on Multiple classifier systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Ensemble methods allow to combine multiple models to increase the predictive performances but mostly utilize labelled data. In this paper we propose several ensemble methods to learn a combined parameterized state-value function of multiple agents. For this purpose the Temporal-Difference (TD) and Residual-Gradient (RG) update methods as well as a policy function is adapted to learn from joint decisions. Such joint decisions include Majority Voting and Averaging of the state-values. We apply these ensemble methods to the simple pencil-and-paper game Tic-Tac-Toe and show that an ensemble of three agents outperforms a single agent in terms of the Mean-Squared Error (MSE) to the true values as well as in terms of the resulting policy. Further we apply the same methods to learn the shortest path in a 20 × 20 maze and empirically show that the learning speed is faster and the resulting policy, i.e. the number of correctly choosen actions is better in an ensemble of multiple agents than that of a single agent.