A generalized parallel algorithm for frequent itemset mining

Authors:
Mitica Craus;Alexandru Archip
Affiliations:
"Gh. Asachi" Technical University, Department of Computer Engineering, Iasi, Romania;"Gh. Asachi" Technical University, Department of Computer Engineering, Iasi, Romania
Venue:
ICCOMP'08 Proceedings of the 12th WSEAS international conference on Computers
Year:
2008

Citing 8
Cited 3

An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Hash based parallel algorithms for mining association rules

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Rapid association rule mining

Proceedings of the tenth international conference on Information and knowledge management
Parallel and Distributed Association Mining: A Survey

IEEE Concurrency
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)

Rapid miner e-commerce

ACMOS'10 Proceedings of the 12th WSEAS international conference on Automatic control, modelling & simulation
Performance comparison of apriori and FP-growth algorithms in generating association rules

ECC'11 Proceedings of the 5th European conference on European computing conference
Parallel approaches to machine learning-A comprehensive survey

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A parallel algorithm for finding the frequent itemsets in a set of transactions is presented. The frequent individual items are identified by their index. We assume that processors number (m) is less than the frequent items number (n). At the first stage, every processor Pi, i isin; {1, ...,m - 1} sequentially computes the frequent itemsets from the interval Ii = [(i - 1) cdot; p + 1, i cdot; p], where p = lfloor;n/mrfloor;. The processor Pm computes frequent itemsets from the interval Im = [(m - 1) cdot; p + 1, n]. In the second stage, the parallel algorithm is applied. The processor Pi computes, step by step, the sets FIi,Ij of the frequent itemsets with individual items from the intervals Ii,j = Ii∪Ii+1∪...∪Ij, j = i+1,...,m. In order to compute the set FIi,Ij, the processor Pi uses FIi,Ij-1 obtained in the previous step and FIi+1,Ij received from the processor Pi+1. The main advantage of our parallel algorithm is that it uses a communication pattern known before algorithm start, which permits to map the communication to hardware. Another major advantage is that the set of the transactions can be distributed to processors before the beginning of the algorithm. This is possible because a processor Pi has to compute FIi,Ij, j = i + 1, ..., m and therefore only the transactions containing the frequent items starting with Ii are needed.