Design and Evaluation of a High-Level Interface for Data Mining

  • Authors:
  • Ruoming Jin;Gagan Agrawal

  • Affiliations:
  • -;-

  • Venue:
  • IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a case study in developing an application class specific high-level interface for shared memory parallel programming. The application class we focus on is data mining. With the availability of large datasets in areas like bioinformatics, medical informatics, scientific data analysis, financial analysis, telecommunications, retailing, and marketing, data mining tasks have become an important application class for high performance computing.Our study of a number of common data mining algorithms has shown that the same set of parallelization techniques can be applied to all of them. To exploit this, we have developed a reduction-object based interface to rapidly specify a shared memory parallel data mining algorithm. The set of parallelization techniques we target include full replication, optimized full locking, and cachesensitive locking. We show how our runtime system can apply any of these technique starting from a common specification.We have evaluated our high-level interface and the parallelization techniques using apriori association mining and k-means clustering algorithms. Our experimental results show that the overhead of the interface is within 10% and our parallelization techniques scale well.