Parallel Bifold: Large-scale parallel pattern mining with constraints

Authors:
Mohammad El-Hajj;Osmar R. Zaïane
Affiliations:
Department of Computing Science, University of Alberta, Edmonton, Canada;Department of Computing Science, University of Alberta, Edmonton, Canada
Venue:
Distributed and Parallel Databases
Year:
2006

Citing 26
Cited 0

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Efficient parallel data mining for association rules

CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
Scalable parallel data mining for association rules

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Asynchronous parallel algorithm for mining association rules on a shared-memory multi-processors

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Parallel programming: techniques and applications using networked workstations and parallel computers

Parallel programming: techniques and applications using networked workstations and parallel computers
Optimization of constrained frequent set queries with 2-variable constraints

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Rule Discovery in Telecommunication AlarmData

Journal of Network and Systems Management
Can we push more constraints into frequent pattern mining?

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
A fast distributed algorithm for mining association rules

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Parallel data mining for association rules on shared memory systems

Knowledge and Information Systems
Efficient Adaptive-Support Association Rule Mining for Recommender Systems

Data Mining and Knowledge Discovery
Parallel and Distributed Association Mining: A Survey

IEEE Concurrency
Parallel Mining of Association Rules

IEEE Transactions on Knowledge and Data Engineering
Fast Parallel Association Rule Mining without Candidacy Generation

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
DualMiner: a dual-pruning algorithm for itemsets with constraints

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Text Document Categorization by Term Association

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Web usage mining: discovery and applications of usage patterns from Web data

ACM SIGKDD Explorations Newsletter
Mining Recurrent Items in Multimedia with Progressive Resolution Refinement

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Parallel Mining of Maximal Frequent Itemsets from Databases

ICTAI '03 Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence
Pattern lattice traversal by selective jumps

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Bifold Constraint-Based Mining by Simultaneous Monotone and Anti-Monotone Checking

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Mining with constraints by pruning and avoiding ineffectual processing

AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

When computationally feasible, mining huge databases produces tremendously large numbers of frequent patterns. In many cases, it is impractical to mine those datasets due to their sheer size; not only the extent of the existing patterns, but mainly the magnitude of the search space. Many approaches have suggested the use of constraints to apply to the patterns or searching for frequent patterns in parallel. So far, those approaches are still not genuinely effective to mine extremely large datasets.We propose a method that combines both strategies efficiently, i.e. mining in parallel for the set of patterns while pushing constraints. Using this approach we could mine significantly large datasets; with sizes never reported in the literature before. We are able to effectively discover frequent patterns in a database made of billion transactions using a 32 processors cluster in less than an hour and a half.