High performance data mining

  • Authors:
  • Vipin Kumar;Mahesh V. Joshi;Eui-Hong Han;Pang-Ning Tan;Michael Steinbach

  • Affiliations:
  • University of Minnesota, Minneapolis, MN;University of Minnesota, Minneapolis, MN;University of Minnesota, Minneapolis, MN;University of Minnesota, Minneapolis, MN;University of Minnesota, Minneapolis, MN

  • Venue:
  • VECPAR'02 Proceedings of the 5th international conference on High performance computing for computational science
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent times have seen an explosive growth in the availability of various kinds of data. It has resulted in an unprecedented opportunity to develop automated data-driven techniques of extracting useful knowledge. Data mining, an important step in this process of knowledge discovery, consists of methods that discover interesting, non-trivial, and useful patterns hidden in the data [SAD+93, CHY96]. The field of data mining builds upon the ideas from diverse fields such as machine learning, pattern recognition, statistics, database systems, and data visualization. But, techniques developed in these traditional disciplines are often unsuitable due to some unique characteristics of today's data-sets, such as their enormous sizes, high-dimensionality, and heterogeneity. There is a necessity to develop effective parallel algorithms for various data mining techniques. However, designing such algorithms is challenging, and the main focus of the paper is a description of the parallel formulations of two important data mining algorithms: discovery of association rules, and induction of decision trees for classification. We also briefly discuss an application of data mining to the analysis of large data sets collected by Earth observing satellites that need to be processed to better understand global scale changes in biosphere processes and patterns.