A high-performance distributed algorithm for mining association rules

Authors:
Assaf Schuster;Ran Wolff;Dan Trock
Affiliations:
Technion—Israel Institute of Technology, Department of Computer Science, 32000, Haifa, Israel;Technion—Israel Institute of Technology, Department of Computer Science, 32000, Haifa, Israel;Technion—Israel Institute of Technology, Department of Electrical Engineering, 32000, Haifa, Israel
Venue:
Knowledge and Information Systems
Year:
2005

Citing 23
Cited 9

A guided tour of Chernoff bounds

Information Processing Letters
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Efficient parallel data mining for association rules

CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Mining quantitative association rules in large relational tables

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Dynamic itemset counting and implication rules for market basket data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Can we push more constraints into frequent pattern mining?

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Parallel data mining for association rules on shared-memory multi-processors

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Communication-efficient distributed mining of association rules

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
A fast distributed algorithm for mining association rules

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Parallel Algorithms for Discovery of Association Rules

Data Mining and Knowledge Discovery
Parallel Mining of Association Rules

IEEE Transactions on Knowledge and Data Engineering
Scalable Parallel Data Mining for Association Rules

IEEE Transactions on Knowledge and Data Engineering
Pincer Search: A New Algorithm for Discovering the Maximum Frequent Set

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
Fast Parallel Association Rule Mining without Candidacy Generation

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Discovery of Multiple-Level Association Rules from Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Mining Generalized Association Rules

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Effect of Data Skewness in Parallel Mining of Association Rules

PAKDD '98 Proceedings of the Second Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining
New Algorithms for Fast Discovery of Association Rules

New Algorithms for Fast Discovery of Association Rules
Parallel FP-growth on PC cluster

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining

Distributed Mining of Constrained Patterns from Wireless Sensor Data

WI-IATW '06 Proceedings of the 2006 IEEE/WIC/ACM international conference on Web Intelligence and Intelligent Agent Technology
Adaptive learning of dynamic Bayesian networks with changing structures by detecting geometric structures of time series

Knowledge and Information Systems
RMAIN: Association rules maintenance without reruns through data

Information Sciences: an International Journal
Performance study of distributed Apriori-like frequent itemsets mining

Knowledge and Information Systems
Toward boosting distributed association rule mining by data de-clustering

Information Sciences: an International Journal
POTMiner: mining ordered, unordered, and partially-ordered trees

Knowledge and Information Systems
A parallel, distributed algorithm for relational frequent pattern discovery from very large data sets

Intelligent Data Analysis - Ubiquitous Knowledge Discovery
Mining frequent patterns from XML data: Efficient algorithms and design trade-offs

Expert Systems with Applications: An International Journal
Distributed mining of constrained frequent sets from uncertain data

ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part I

Quantified Score

Hi-index	0.01

Visualization

Abstract

We present a new distributed association rule mining (D-ARM) algorithm that demonstrates superlinear speed-up with the number of computing nodes. The algorithm is the first D-ARM algorithm to perform a single scan over the database. As such, its performance is unmatched by any previous algorithm. Scale-up experiments over standard synthetic benchmarks demonstrate stable run time regardless of the number of computers. Theoretical analysis reveals a tighter bound on error probability than the one shown in the corresponding sequential algorithm. As a result of this tighter bound and by utilizing the combined memory of several computers, the algorithm generates far fewer candidates than comparable sequential algorithms—the same order of magnitude as the optimum.