Apriori-based frequent itemset mining algorithms on MapReduce

  • Authors:
  • Ming-Yen Lin;Pei-Yu Lee;Sue-Chen Hsueh

  • Affiliations:
  • Feng Chia University, Taichung, Taiwan;Feng Chia University, Taichung, Taiwan;Chaoyang University of Technology, Taichung, Taiwan

  • Venue:
  • Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many parallelization techniques have been proposed to enhance the performance of the Apriori-like frequent itemset mining algorithms. Characterized by both map and reduce functions, MapReduce has emerged and excels in the mining of datasets of terabyte scale or larger in either homogeneous or heterogeneous clusters. Minimizing the scheduling overhead of each map-reduce phase and maximizing the utilization of nodes in each phase are keys to successful MapReduce implementations. In this paper, we propose three algorithms, named SPC, FPC, and DPC, to investigate effective implementations of the Apriori algorithm in the MapReduce framework. DPC features in dynamically combining candidates of various lengths and outperforms both the straight-forward algorithm SPC and the fixed passes combined counting algorithm FPC. Extensive experimental results also show that all the three algorithms scale up linearly with respect to dataset sizes and cluster sizes.