A Survey of Methods for Scaling Up Inductive Algorithms

  • Authors:
  • Foster Provost;Venkateswarlu Kolluri

  • Affiliations:
  • Bell Atlantic Science and Technology, 500 Westchester Avenue, White Plains, New York 10604. provost@acm.org;Department of Information Science, University of Pittsburgh, Pittsburgh, PA 15260, and Lycos, Inc., 5001 Centre Avenue, Pittsburgh, PA 15213. venkat@sis.pitt.edu

  • Venue:
  • Data Mining and Knowledge Discovery
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

One of the defining challenges for the KDD researchcommunity is to enable inductive learning algorithms to mine verylarge databases. This paper summarizes, categorizes, and comparesexisting work on scaling up inductive algorithms. We concentrate onalgorithms that build decision trees and rule sets, in order toprovide focus and specific details; the issues and techniquesgeneralize to other types of data mining. We begin with a discussionof important issues related to scaling up. We highlight similaritiesamong scaling techniques by categorizing them into three mainapproaches. For each approach, we then describe, compare, andcontrast the different constituent techniques, drawing on specificexamples from published papers. Finally, we use the precedinganalysis to suggest how to proceed when dealing with a largeproblem, and where to focus future research.