Optimization of frequent itemset mining on multiple-core processor

  • Authors:
  • Li Liu;Eric Li;Yimin Zhang;Zhizhong Tang

  • Affiliations:
  • Tsinghua University, Beijing, China and Intel China Research Center, Beijing, China;Intel China Research Center, Beijing, China;Intel China Research Center, Beijing, China;Tsinghua University, Beijing, China

  • Venue:
  • VLDB '07 Proceedings of the 33rd international conference on Very large data bases
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Multi-core processors are proliferated across different domains in recent years. In this paper, we study the performance of frequent pattern mining on a modern multi-core machine. A detailed study shows that, even with the best implementation, current FP-tree based algorithms still under-utilize a multi-core system due to poor data locality and insufficient parallelism expression. We propose two techniques: a cache-conscious FP-array (frequent pattern array) and a lock-free dataset tiling parallelization mechanism to address this problem. The FP-array efficiently improves the data locality performance, and makes use of the benefits from hardware and software prefetching. The result yields an overall 4.0 speedup compared with the state-of-the-art implementation. Furthermore, to unlock the power of multi-core processor, a lock-free parallelization approach is proposed to restructure the FP-tree building algorithm. It not only eliminates the locks in building a single FP-tree with fine-grained threads, but also improves the temporal data locality performance. To summarize, with the proposed cache-conscious FP-array and lock-free parallelization enhancements, the overall FP-tree algorithm achieves a 24 fold speedup on an 8-core machine. Finally, we believe the presented techniques can be applied to other data mining tasks as well with the prevalence of multi-core processor.