Parallel algorithm for mining frequent closed sequences

  • Authors:
  • Chuanxiang Ma;Qinghua Li

  • Affiliations:
  • School of computer science, Huazhong University of science and technology, China;School of computer science, Huazhong University of science and technology, China

  • Venue:
  • AIS-ADM 2005 Proceedings of the 2005 international conference on Autonomous Intelligent Systems: agents and Data Mining
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Previous studies have presented convincing arguments that a frequent sequence mining algorithm should not mine all frequent sequences but only the closed ones because the latter leads to not only more compact yet complete result set but also better efficiency. However, frequent closed sequence mining is still challenging on stand-alone for its large size and high dimension. In this paper, an algorithm, PFCSeq, is presented for mining frequent closed sequence based on distributed-memory parallel machine, in which each processor mines local frequent closed sequence set independently using task parallelism with data parallelism approach, and only two communications are needed except that imbalance is detected. Therefore, time spent in communications is significantly reduced. In order to ensure good load balance among processors, a dynamic workload balance strategy is proposed. Experiments show that it is linearly scalable in terms of database size and the number of processors.