Pasting Small Votes for Classification in Large Databases and On-Line

  • Authors:
  • Leo Breiman

  • Affiliations:
  • Statistics Department, University of California, Berkeley, CA 94708. leo@stat.berkeley.edu

  • Venue:
  • Machine Learning
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many databases have grown to the point where they cannot fit into the fast memory of even large memory machines, to say nothing of current workstations. If what we want to do is to use these data bases to construct predictions of various characteristics, then since the usual methods require that all data be held in fast memory, various work-arounds have to be used. This paper studies one such class of methods which give accuracy comparable to that which could have been obtained if all data could have been held in core and which are computationally fast. The procedure takes small pieces of the data, grows a predictor on each small piece and then pastes these predictors together. A version is given that scales up to terabyte data sets. The methods are also applicable to on-line learning.