Frequent Itemset Mining on Large-Scale Shared Memory Machines

  • Authors:
  • Yan Zhang;Fan Zhang;Jason Bakos

  • Affiliations:
  • -;-;-

  • Venue:
  • CLUSTER '11 Proceedings of the 2011 IEEE International Conference on Cluster Computing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Frequent Item set Mining (FIM) is a data mining task that is used to find frequently-occurring subsets amongst a database of item sets. FIM is a non-numerical data intensive computation and is frequently used in machine learning and computational biology applications. The development of increasingly efficient FIM algorithms is an active field, but exposing and exploiting parallelism is not often emphasized in the development of new FIM algorithms. In this paper, we explore parallel implementations of two FIM algorithms, Apriori and Eclat, each using three different representations: vertical transaction id set, vertical bit vector, and diffset. We implemented these algorithms using OpenMP and evaluated their resultant scalability on the 4096-core Intel Nehalem-EX SGI Altix shared-memory machine Teragrid "Blacklight" using 16 processors (one blade) to 256 processors (16 blades) and reported our results. We found that, while scalability generally depends on the input data, Apriori is only scalable when used with diffset. On the other side, Eclat is generally scalable but achieves its best scalability with diffset.