A parallel algorithm for mining multiple partial periodic patterns
Information Sciences: an International Journal
Hi-index | 0.00 |
Existing parallel association rule mining algorithms sufferfrom many problems when mining massive transactionaldatasets. One major problem is that most of the parallelalgorithms for a shared nothing environment are Apriori-basedalgorithms. Apriori-based algorithms are proven tobe not scalable due to many reasons, mainly: (1) the repetitiveI/O disk scans, (2) the huge computation and communicationinvolved during the candidacy generation.This paper proposes a new disk-based parallel associationrule mining algorithm called Inverted Matrix, whichachieves its efficiency by applying three new ideas. First,transactional data is converted into a new database layoutcalled Inverted Matrix that prevents multiple scanningof the database during the mining phase, in which findingglobally frequent patterns could be achieved in less than afull scan with random access. This data structure is replicatedamong the parallel nodes. Second, for each frequentitem assigned to a parallel node, a relatively small independenttree is built summarizing co-occurrences. Finally, asimple and non-recursive mining process reduces the memoryrequirements as minimum candidacy generation andcounting is needed, and no communication between nodesis required to generate all globally frequent patterns.