Parallel Leap: Large-Scale Maximal Pattern Mining in a Distributed Environment

Authors:
Mohammad El-Hajj;Osmar R. Zaiane
Affiliations:
University of Alberta Edmonton, Canada;University of Alberta Edmonton, Canada
Venue:
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Year:
2006

Citing 15
Cited 11

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Efficient parallel data mining for association rules

CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
Scalable parallel data mining for association rules

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Asynchronous parallel algorithm for mining association rules on a shared-memory multi-processors

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Parallel programming: techniques and applications using networked workstations and parallel computers

Parallel programming: techniques and applications using networked workstations and parallel computers
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A fast distributed algorithm for mining association rules

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Parallel data mining for association rules on shared memory systems

Knowledge and Information Systems
Parallel Mining of Association Rules

IEEE Transactions on Knowledge and Data Engineering
Fast Parallel Association Rule Mining without Candidacy Generation

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Parallel Mining of Maximal Frequent Itemsets from Databases

ICTAI '03 Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence
Pattern lattice traversal by selective jumps

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining

Distributed Mining of Constrained Patterns from Wireless Sensor Data

WI-IATW '06 Proceedings of the 2006 IEEE/WIC/ACM international conference on Web Intelligence and Intelligent Agent Technology
Online analytical mining association rules using Chi-square test

International Journal of Business Intelligence and Data Mining
Pfp: parallel fp-growth for query recommendation

Proceedings of the 2008 ACM conference on Recommender systems
Frequent itemset mining on graphics processors

Proceedings of the Fifth International Workshop on Data Management on New Hardware
A parallel algorithm for computing borders

Proceedings of the 20th ACM international conference on Information and knowledge management
Distributed mining of constrained frequent sets from uncertain data

ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part I
A distributed recommender system architecture

International Journal of Web Engineering and Technology
PARMA: a parallel randomized algorithm for approximate association rules mining in MapReduce

Proceedings of the 21st ACM international conference on Information and knowledge management
Parallel approaches to machine learning-A comprehensive survey

Journal of Parallel and Distributed Computing
Efficient mining of frequent itemsets in social network data based on MapReduce framework

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Direct out-of-memory distributed parallel frequent pattern mining

Proceedings of the 2nd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

When computationally feasible, mining extremely large databases produces tremendously large numbers of frequent patterns. In many cases, it is impractical to mine those datasets due to their sheer size; not only the extent of the existing patterns, but mainly the magnitude of the search space. Many approaches have been suggested such as sequential mining for maximal patterns or searching for all frequent patterns in parallel. So far, those approaches are still not genuinely effective to mine extremely large datasets. In this work we propose a method that combines both strategies efficiently, i.e. mining in parallel for the set of maximal patterns which, to the best of our knowledge, has never been proposed efficiently before. Using this approach we could mine significantly large datasets; with sizes never reported in the literature before. We are able to effectively discover frequent patterns in a database made of billion transactions using a 32 processors cluster in less than 2 hours.