Training regression ensembles by sequential target correction and resampling

  • Authors:
  • Ricardo íanculef;Carlos Valle;Héctor Allende;Claudio Moraga

  • Affiliations:
  • Department of Computer Science, Universidad Técnica Federico Santa María, CP 110-V Valparaíso, Chile;Department of Computer Science, Universidad Técnica Federico Santa María, CP 110-V Valparaíso, Chile;Department of Computer Science, Universidad Técnica Federico Santa María, CP 110-V Valparaíso, Chile;European Centre for Soft Computing, 33600 Mieres, Spain and Faculty of Computer Science, Dortmund University of Technology, 44221 Dortmund, Germany

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2012

Quantified Score

Hi-index 0.07

Visualization

Abstract

Ensemble methods learn models from examples by generating a set of hypotheses, which are then combined to make a single decision. We propose an algorithm to construct an ensemble for regression estimation. Our proposal generates the hypotheses sequentially using a simple procedure whereby the target map to be learned by the base learner at each step is modified as a function of the previous step error. We state a theorem that relates the overall upper error bound of the composite hypothesis obtained within this procedure to the training errors of the individual hypotheses. We also demonstrate that the proposed procedure results in a learning functional that enforces a weighted form of Negative Correlation with respect to previous hypotheses. Additionally, we incorporate resampling to allow the ensemble to control the impact of highly influential data points, showing that this component significantly improves its ability to generalize from the known examples. We describe experiments performed to evaluate our technique on real and synthetic datasets using neural networks as base learners. These results show that our technique exhibits considerably better prediction errors than the Negative Correlation (NC) method and that its performance is very competitive with that of the Bagging and AdaBoost algorithms for regression estimation.