Data mining tasks and methods: scalability

  • Authors:
  • Foster Provost;Venkateswarlu Kolluri

  • Affiliations:
  • Associate Professor of Information Systems, Leonard N. Stern School of Business, New York University, New York;Research Scientist, Terra Lycos, Waltham, Massachusetts

  • Venue:
  • Handbook of data mining and knowledge discovery
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

One of the defining challenges for the KDD research community is scaling up data mining algorithms to mine very large collections of data. This article summarizes, categorizes, and compares existing work on scaling up data mining algorithms. In order to provide focus and specific details, we concentrate on algorithms that build decision trees and rule sets; the issues and techniques generalize to other types of data mining. We discuss the important issues related to scaling up and highlight similarities among scaling techniques by categorizing them into three main approaches. We describe in detail the characteristic features of each category, using specific examples as needed, and we compare and contrast different constituent techniques.