A New Parallel Algorithm for the Frequent Itemset Mining Problem

  • Authors:
  • Mitica Craus

  • Affiliations:
  • -

  • Venue:
  • ISPDC '08 Proceedings of the 2008 International Symposium on Parallel and Distributed Computing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

A new parallel algorithm for finding the frequent itemsets in databases is presented. It differs fundamentally of well known Apriori algorithm, where at the beginning of every step, the dimension of the new frequent itemsets increases by 1 . In our algorithm the frequent itemsets are determined by progressively enlarging the interval which the individual items appertain, i.e. if at the k-th step the new candidates are from [i,i+k] intervals, i=1,2,…,n-k, at the next step, k+1, the new candidates will belong to [i, i+k+1] intervals, i=1,2,...,n-k-1. The frequent individual items are identified by their index. The basic idea is that the new frequent itemsets with individual items from the interval [i,j], simultaneously contain the items i and j. The frequent itemsets are built by sharing the work between n processors. Hereby, the processor P[i] computes, step by step, the sets F[i,j] of the frequent itemsets with individual items from the intervals [i,j], j=i,...,n. In order to compute the set F[i,j], the processing unit P[i] uses F[i,j-1] obtained in the previous step and F[i+1,j] received from the processor P[i+1]. The main advantage of our parallel algorithm is that it uses a communication pattern known before algorithm start, which allows mapping communication to hardware. Another major advantage is that the set of the transactions can be distributed to processors prior to beginning. This is possible because a processor P[i] has to compute F[i,j], j=i,...,n and therefore only the transactions containing the frequent item i are needed.