BIDE-Based parallel mining of frequent closed sequences with mapreduce

  • Authors:
  • Dongjin Yu;Wei Wu;Suhang Zheng;Zhixiang Zhu

  • Affiliations:
  • School of Computer, Hangzhou Dianzi University, Hangzhou, China;Zhejiang Provincial Key Laboratory of Network Technology and Information Security, Hangzhou, China;School of Computer, Hangzhou Dianzi University, Hangzhou, China;School of Computer, Hangzhou Dianzi University, Hangzhou, China

  • Venue:
  • ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Parallel processing is essential to mining frequent closed sequences from massive volume of data in a timely manner. On the other hand, MapReduce is an ideal software framework to support distributed computing on large data sets on clusters of computers. In this paper, we develop a parallel implementation of BIDE algorithm on MapReduce, called BIDE-MR. It iteratively assigns the tasks of closure checking and pruning to different nodes in cluster. After one round of map-combine-partition-reduce, the closed frequent sequences with round-specific length and the candidates for the next round of computation are generated. Since the candidates and their pseudo project databases are independent with each other, BIDE-MR achieves high speed-ups. We implement BIDE-MR on an Apache Hadoop cluster and use BIDE-MR to mine the vehicles which frequently appear together from massive records collected at different monitoring sites. The results show that BIDE-MR attains good parallelization.