Parallel and Distributed Frequent Pattern Mining in Large Databases

Authors:
Syed Khairuzzaman Tanbeer;Chowdhury Farhan Ahmed;Byeong-Soo Jeong
Affiliations:
-;-;-
Venue:
HPCC '09 Proceedings of the 2009 11th IEEE International Conference on High Performance Computing and Communications
Year:
2009

Citing 0
Cited 3

A fine-grained scheduling strategy for improving the performance of parallel frequent itemsets mining

International Journal of Computational Science and Engineering
Extracting incidental and global knowledge through compact pattern trees in distributed environment

RSKT'12 Proceedings of the 7th international conference on Rough Sets and Knowledge Technology
Novel parallel method for mining frequent patterns on multi-core shared memory systems

DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, a significant number of parallel and distributed algorithms have been proposed to mine frequent patterns (FP) from large and/or distributed databases. Among them parallelization of the FP-growth algorithms using the FP-tree has been proved to be highly efficient. However, the FP-tree-based techniques suffer from two major limitations such as multiple database scans requirement (i.e., high I/O cost) and high inter-processor communications cost (during the mining phase). Therefore, we propose a novel tree structure, called PP-tree (Parallel Pattern tree) that significantly reduces the I/O cost by capturing the database contents with a single scan and facilitates the efficient FP-growth mining on it with reduced inter-processor communication overhead. Our parallel algorithm works independently at each local site and locally generates global frequent patterns which are merged at the final stage. The experimental results reflect that parallel and distributed FP mining with PP-tree outperforms other state-of-the-art algorithms.