Parallel Association Rule Mining with Minimum Inter-Processor Communication

  • Authors:
  • Mohammad El-Hajj;Osmar R. Zaïane

  • Affiliations:
  • -;-

  • Venue:
  • DEXA '03 Proceedings of the 14th International Workshop on Database and Expert Systems Applications
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Existing parallel association rule mining algorithms sufferfrom many problems when mining massive transactionaldatasets. One major problem is that most of the parallelalgorithms for a shared nothing environment are Apriori-basedalgorithms. Apriori-based algorithms are proven tobe not scalable due to many reasons, mainly: (1) the repetitiveI/O disk scans, (2) the huge computation and communicationinvolved during the candidacy generation.This paper proposes a new disk-based parallel associationrule mining algorithm called Inverted Matrix, whichachieves its efficiency by applying three new ideas. First,transactional data is converted into a new database layoutcalled Inverted Matrix that prevents multiple scanningof the database during the mining phase, in which findingglobally frequent patterns could be achieved in less than afull scan with random access. This data structure is replicatedamong the parallel nodes. Second, for each frequentitem assigned to a parallel node, a relatively small independenttree is built summarizing co-occurrences. Finally, asimple and non-recursive mining process reduces the memoryrequirements as minimum candidacy generation andcounting is needed, and no communication between nodesis required to generate all globally frequent patterns.