Parallel Algorithms for Discovery of Association Rules

Authors:
Mohammed J. Zaki;Srinivasan Parthasarathy;Mitsunori Ogihara;Wei Li
Affiliations:
Department of Computer Science, University of Rochester, Rochester, NY 14627.;Department of Computer Science, University of Rochester, Rochester, NY 14627.;Department of Computer Science, University of Rochester, Rochester, NY 14627.;Oracle Corporation, 500 Oracle Parkway, M/S 4op9, Redwood Shores, CA 94065. E-mail: weili@us.oracle.com
Venue:
Data Mining and Knowledge Discovery
Year:
1997

Citing 18
Cited 60

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Efficient parallel data mining for association rules

CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Scalable parallel data mining for association rules

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Fast discovery of association rules

Advances in knowledge discovery and data mining
A localized algorithm for parallel association mining

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Parallel data mining for association rules on shared-memory multi-processors

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
A fast distributed algorithm for mining association rules

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Memory Channel Network for PCI

IEEE Micro
Efficient Mining of Association Rules in Distributed Databases

IEEE Transactions on Knowledge and Data Engineering
Parallel Mining of Association Rules

IEEE Transactions on Knowledge and Data Engineering
Set-Oriented Mining for Association Rules in Relational Databases

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Evaluation of sampling for data mining of association rules

RIDE '97 Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications
New Algorithms for Fast Discovery of Association Rules

New Algorithms for Fast Discovery of Association Rules

High performance data mining (tutorial PM-3)

Tutorial notes of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
PlanMine: Predicting Plan Failures Using Sequence Mining

Artificial Intelligence Review - Issues on the application of data mining
Systems support for scalable data mining

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
A fast algorithm for mining sequential patterns from large databases

Journal of Computer Science and Technology
Scalable frequent-pattern mining methods: an overview

Tutorial notes of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Boosting Algorithms for Parallel and Distributed Learning

Distributed and Parallel Databases - Special issue: Parallel and distributed data mining
Guest Editors' Introduction: Parallel and Distributed Computing for Data Mining

IEEE Concurrency
Parallel and Distributed Association Mining: A Survey

IEEE Concurrency
Scalable Parallel Data Mining for Association Rules

IEEE Transactions on Knowledge and Data Engineering
Scalable Algorithms for Association Mining

IEEE Transactions on Knowledge and Data Engineering
INDED: A Distributed Knowledge-Based Learning System

IEEE Intelligent Systems
Towards Network-Aware Data Mining

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
A Requirements Analysis for Parallel KDD Systems

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
The Parallelization of a Knowledge Discovery System with Hypergraph Representation

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Discovering Association Rules in Large, Dense Databases

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Efficiently Mining Approximate Models of Associations in Evolving Databases

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Parallel Data Mining on Large Scale PC Cluster

WAIM '00 Proceedings of the First International Conference on Web-Age Information Management
Mining of Association Rules in Very Large Databases: A Structured Parallel Approach

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Formal Logics of Discovery and Hypothesis Formation by Machine

DS '98 Proceedings of the First International Conference on Discovery Science
Parallel and Distributed Data Mining: An Introduction

Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Efficient Parallel Algorithms for Mining Associations

Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
A Data-Clustering Algorithm on Distributed Memory Multiprocessors

Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Parallel Sequence Mining on Shared-Memory Machines

Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Active Mining in a Distributed Setting

Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
InterAct: Virtual Sharing for Interactive Client-Server Applications

LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Rough sets and boolean reasoning

Granular computing
Formal logics of discovery and hypothesis formation by machine

Theoretical Computer Science
Rough sets perspective on data and knowledge

Handbook of data mining and knowledge discovery
Mining Frequent Itemsets in Distributed and Dynamic Databases

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
OP-Cluster: Clustering by Tendency in High Dimensional Space

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Association Rule Mining in Peer-to-Peer Systems

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Memory-adative association rules mining

Information Systems - Databases: Creation, management and utilization
A high-performance distributed algorithm for mining association rules

Knowledge and Information Systems
A sampling-based framework for parallel data mining

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
A generalized framework for mining spatio-temporal patterns in scientific data

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Association against dissociation: some pragmatic considerations for frequent itemset generation under fixed and variable thresholds

ACM SIGKDD Explorations Newsletter
Distributed Mining of Maximal Frequent Itemsets on a Data Grid System

The Journal of Supercomputing
Toward terabyte pattern mining: an architecture-conscious solution

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Parallel mining of association rules from text databases

The Journal of Supercomputing
Learning quantifiable associations via principal sparse non-negative matrix factorization

Intelligent Data Analysis
Decentralized load balancing for highly irregular search problems

Microprocessors & Microsystems
Algorithms for clustering high dimensional and distributed data

Intelligent Data Analysis
Association-based similarity testing and its applications

Intelligent Data Analysis
On mining micro-array data by Order-Preserving Submatrix

International Journal of Bioinformatics Research and Applications
Efficient mining of maximal frequent itemsets from databases on a cluster of workstations

Knowledge and Information Systems
Distributed and Shared Memory Algorithm for Parallel Mining of Association Rules

MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Measures of Ruleset Quality Capable to Represent Uncertain Validity

ECSQARU '07 Proceedings of the 9th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty
Trace Mining from Distributed Assembly Databases for Causal Analysis

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Measures of ruleset quality for general rules extraction methods

International Journal of Approximate Reasoning
A load-balanced distributed parallel mining algorithm

Expert Systems with Applications: An International Journal
Optimal constraint-based decision tree induction from itemset lattices

Data Mining and Knowledge Discovery
Integrating constraint programming and itemset mining

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Improving the efficiency of FP tree construction using transactional patternbase

Proceedings of the 8th International Conference on Frontiers of Information Technology
The discovery of frequent patterns with logic and constraint programming

MAMECTIS/NOLASC/CONTROL/WAMUS'11 Proceedings of the 13th WSEAS international conference on mathematical methods, computational techniques and intelligent systems, and 10th WSEAS international conference on non-linear analysis, non-linear systems and chaos, and 7th WSEAS international conference on dynamical systems and control, and 11th WSEAS international conference on Wavelet analysis and multirate systems: recent researches in computational techniques, non-linear systems and control
Mining quantitative associations in large database

APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
Mining and validation of localized frequent web access patterns with dynamic tolerance

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Rough Sets and Association Rule Generation

Fundamenta Informaticae
Parallel approaches to machine learning-A comprehensive survey

Journal of Parallel and Distributed Computing
Scalable frequent itemset mining on many-core processors

Proceedings of the Ninth International Workshop on Data Management on New Hardware
Randomly sampling maximal itemsets

Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Discovery of association rules is an important data mining task.Several parallel and sequential algorithms have been proposed in the literature to solve this problem. Almost all of these algorithms makerepeated passes over the database to determine the set of frequentitemsets (a subset of database items), thus incurringhigh I/O overhead. In the parallel case, most algorithms perform asum-reduction at the end of each pass to construct the global counts, alsoincurring high synchronization cost.In this paper we describe new parallel association mining algorithms. Thealgorithms use novel itemset clustering techniques to approximate the set ofpotentially maximal frequent itemsets. Once this set has been identified,the algorithms make use of efficient traversal techniques to generate thefrequent itemsets contained in each cluster. We propose two clusteringschemes based on equivalence classes and maximal hypergraph cliques, andstudy two lattice traversal techniques based on bottom-up and hybrid search.We use a vertical database layout to cluster related transactions together. The database is also selectively replicated so that the portion of thedatabase needed for the computation of associations is local to eachprocessor. After the initial set-up phase, the algorithms do not need anyfurther communication or synchronization. The algorithms minimize I/Ooverheads by scanning the local database portion only twice. Once in theset-up phase, and once when processing the itemset clusters. Unlike previousparallel approaches, the algorithms use simple intersection operations tocompute frequent itemsets and do not have to maintain or search complex hashstructures.Our experimental testbed is a 32-processor DEC Alpha clusterinter-connected by the Memory Channel network. We present results on theperformance of our algorithms on various databases, and compare it against awell known parallel algorithm. The best new algorithm outperforms it by anorder of magnitude.