Online Ensemble Learning: An Empirical Study

  • Authors:
  • Alan Fern;Robert Givan

  • Affiliations:
  • Department of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, USA. AFERN@ECN.PURDUE.EDU;Department of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, USA. GIVAN@ECN.PURDUE.EDU

  • Venue:
  • Machine Learning
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We study resource-limited online learning, motivated by the problem of conditional-branch outcome prediction in computer architecture. In particular, we consider (parallel) time and space-efficient ensemble learners for online settings, empirically demonstrating benefits similar to those shown previously for offline ensembles. Our learning algorithms are inspired by the previously published “boosting by filtering” framework as well as the offline Arc-x4 boosting-style algorithm. We train ensembles of online decision trees using a novel variant of the ID4 online decision-tree algorithm as the base learner, and show empirical results for both boosting and bagging-style online ensemble methods. Our results evaluate these methods on both our branch prediction domain and online variants of three familiar machine-learning benchmarks. Our data justifies three key claims. First, we show empirically that our extensions to ID4 significantly improve performance for single trees and additionally are critical to achieving performance gains in tree ensembles. Second, our results indicate significant improvements in predictive accuracy with ensemble size for the boosting-style algorithm. The bagging algorithms we tried showed poor performance relative to the boosting-style algorithm (but still improve upon individual base learners). Third, we show that ensembles of small trees are often able to outperform large single trees with the same number of nodes (and similarly outperform smaller ensembles of larger trees that use the same total number of nodes). This makes online boosting particularly useful in domains such as branch prediction with tight space restrictions (i.e., the available real-estate on a microprocessor chip).