A generalized parallel algorithm for frequent itemset mining

  • Authors:
  • Mitica Craus;Alexandru Archip

  • Affiliations:
  • "Gh. Asachi" Technical University, Department of Computer Engineering, Iasi, Romania;"Gh. Asachi" Technical University, Department of Computer Engineering, Iasi, Romania

  • Venue:
  • ICCOMP'08 Proceedings of the 12th WSEAS international conference on Computers
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

A parallel algorithm for finding the frequent itemsets in a set of transactions is presented. The frequent individual items are identified by their index. We assume that processors number (m) is less than the frequent items number (n). At the first stage, every processor Pi, i isin; {1, ...,m - 1} sequentially computes the frequent itemsets from the interval Ii = [(i - 1) cdot; p + 1, i cdot; p], where p = lfloor;n/mrfloor;. The processor Pm computes frequent itemsets from the interval Im = [(m - 1) cdot; p + 1, n]. In the second stage, the parallel algorithm is applied. The processor Pi computes, step by step, the sets FIi,Ij of the frequent itemsets with individual items from the intervals Ii,j = Ii∪Ii+1∪...∪Ij, j = i+1,...,m. In order to compute the set FIi,Ij, the processor Pi uses FIi,Ij-1 obtained in the previous step and FIi+1,Ij received from the processor Pi+1. The main advantage of our parallel algorithm is that it uses a communication pattern known before algorithm start, which permits to map the communication to hardware. Another major advantage is that the set of the transactions can be distributed to processors before the beginning of the algorithm. This is possible because a processor Pi has to compute FIi,Ij, j = i + 1, ..., m and therefore only the transactions containing the frequent items starting with Ii are needed.