Scalable parallel data mining for association rules

Authors:
Eui-Hong Han;George Karypis;Vipin Kumar
Affiliations:
Department of Computer Science, University of Minnesota, Minneapolis, MN;Department of Computer Science, University of Minnesota, Minneapolis, MN;Department of Computer Science, University of Minnesota, Minneapolis, MN
Venue:
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Year:
1997

Citing 11
Cited 76

Combinatorial optimization: algorithms and complexity

Combinatorial optimization: algorithms and complexity
Introduction to parallel computing: design and analysis of algorithms

Introduction to parallel computing: design and analysis of algorithms
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Parallel Mining of Association Rules

IEEE Transactions on Knowledge and Data Engineering
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Set-Oriented Mining for Association Rules in Relational Databases

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
DBMS Research at a Crossroads: The Vienna Update

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Discovery of Multiple-Level Association Rules from Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Mining Generalized Association Rules

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases

Exploratory mining and pruning optimizations of constrained associations rules

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Parallel mining algorithms for generalized association rules with classification hierarchy

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Asynchronous parallel algorithm for mining association rules on a shared-memory multi-processors

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Dynamic skew handling in parallel mining of association rules

Proceedings of the seventh international conference on Information and knowledge management
Using incremental pruning to increase the efficiency of dynamic itemset counting for mining association rules

Proceedings of the seventh international conference on Information and knowledge management
Scalable algorithms for mining large databases

KDD '99 Tutorial notes of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
High performance data mining (tutorial PM-3)

Tutorial notes of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Beyond intratransaction association analysis: mining multidimensional intertransaction association rules

ACM Transactions on Information Systems (TOIS)
The segment support map: scalable mining of frequent itemsets

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Systems support for scalable data mining

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Scalable frequent-pattern mining methods: an overview

Tutorial notes of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Web community mining and web log mining: commodity cluster based execution

ADC '02 Proceedings of the 13th Australasian database conference - Volume 5
Parallel Algorithms for Discovery of Association Rules

Data Mining and Knowledge Discovery
Effect of Data Distribution in Parallel Mining of Associations

Data Mining and Knowledge Discovery
Parallel frequent set counting

Parallel Computing - Parallel data-intensive algorithms and applications
An Adaptive Algorithm for Mining Association Rules on Shared-Memory Parallel Machines

Distributed and Parallel Databases
Parallel and Distributed Association Mining: A Survey

IEEE Concurrency
Scalable Parallel Data Mining for Association Rules

IEEE Transactions on Knowledge and Data Engineering
Scalable Algorithms for Association Mining

IEEE Transactions on Knowledge and Data Engineering
Effect of Data Skewness and Workload Balance in Parallel Data Mining

IEEE Transactions on Knowledge and Data Engineering
Pincer-Search: An Efficient Algorithm for Discovering the Maximum Frequent Set

IEEE Transactions on Knowledge and Data Engineering
Discovering calendar-based temporal association rules

Data & Knowledge Engineering - Special issue: Temporal representation and reasoning
Alternative Interest Measures for Mining Associations in Databases

IEEE Transactions on Knowledge and Data Engineering
Web Mining Is Parallel

HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
A Requirements Analysis for Parallel KDD Systems

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Data Mining the Yeast Genome in a Lazy Functional Language

PADL '03 Proceedings of the 5th International Symposium on Practical Aspects of Declarative Languages
Parallel Data Mining on Large Scale PC Cluster

WAIM '00 Proceedings of the First International Conference on Web-Age Information Management
Dynamic Load Balancing for Parallel Association Rule Mining on Heterogenous PC Cluster Systems

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Considering Main Memory in Mining Association Rules

DaWaK '99 Proceedings of the First International Conference on Data Warehousing and Knowledge Discovery
Mining of Association Rules in Very Large Databases: A Structured Parallel Approach

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Performance Analysis for Parallel Generalized Association Rule Mining on a Large Scale PC Cluster

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Adaptive Algorithms for Cache-Efficient Trie Search

ALENEX '99 Selected papers from the International Workshop on Algorithm Engineering and Experimentation
An Effective Boolean Algorithm for Mining Association Rules in Large Databases

DASFAA '99 Proceedings of the Sixth International Conference on Database Systems for Advanced Applications
Parallel and Distributed Data Mining: An Introduction

Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Efficient Parallel Algorithms for Mining Associations

Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
A Data-Clustering Algorithm on Distributed Memory Multiprocessors

Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Parallel Sequence Mining on Shared-Memory Machines

Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Parallel Generalized Association Rule Mining on Large Scale PC Cluster

Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
A template model for multidimensional inter-transactional association rules

The VLDB Journal — The International Journal on Very Large Data Bases
Data mining tasks and methods: parallel methods for scaling data mining algorithms to large data sets

Handbook of data mining and knowledge discovery
References

Sourcebook of parallel computing
Processor-embedded distributed smart disks for I/O-intensive workloads: architectures, performance models and evaluation

Journal of Parallel and Distributed Computing
A Super-Programming Approach for Mining Association Rules in Parallel on PC Clusters

IEEE Transactions on Parallel and Distributed Systems
A fuzzy logic based method to acquire user threshold of minimum-support for mining association rules

Information Sciences—Informatics and Computer Science: An International Journal
Shared Memory Parallelization of Data Mining Algorithms: Techniques, Programming Interface, and Performance

IEEE Transactions on Knowledge and Data Engineering
A sampling-based framework for parallel data mining

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
From intra-transaction to generalized inter-transaction: landscaping multidimensional contexts in association rule mining

Information Sciences—Informatics and Computer Science: An International Journal
An e-customer behavior model with online analytical mining for internet marketing planning

Decision Support Systems
Parallel Leap: Large-Scale Maximal Pattern Mining in a Distributed Environment

ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Partitioning strategies for distributed association rule mining

The Knowledge Engineering Review
Parallel Bifold: Large-scale parallel pattern mining with constraints

Distributed and Parallel Databases
Toward terabyte pattern mining: an architecture-conscious solution

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Finding association rules of cis-regulatory elements involved in alternative splicing

ACM-SE 45 Proceedings of the 45th annual southeast regional conference
Dynamic Association Rule Mining using Genetic Algorithms

Intelligent Data Analysis
Algorithms for clustering high dimensional and distributed data

Intelligent Data Analysis
Association-based similarity testing and its applications

Intelligent Data Analysis
A Fast Parallel Association Rules Mining Algorithm Based on FP-Forest

ISNN '08 Proceedings of the 5th international symposium on Neural Networks: Advances in Neural Networks, Part II
Mining of cis-Regulatory Motifs Associated with Tissue-Specific Alternative Splicing

ISBRA '09 Proceedings of the 5th International Symposium on Bioinformatics Research and Applications
From intra-transaction to generalized inter-transaction: Landscaping multidimensional contexts in association rule mining

Information Sciences: an International Journal
Performance characterization of data mining benchmarks

Proceedings of the 2010 Workshop on Interaction between Compilers and Computer Architecture
High performance data mining

VECPAR'02 Proceedings of the 5th international conference on High performance computing for computational science
Compiler and middleware support for scalable data mining

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Association rule mining: models and algorithms

Association rule mining: models and algorithms
Discovering itemset interactions

ACSC '09 Proceedings of the Thirty-Second Australasian Conference on Computer Science - Volume 91
A novel association rule mining based on immune computational intelligence

LSMS/ICSEE'10 Proceedings of the 2010 international conference on Life system modeling and simulation and intelligent computing, and 2010 international conference on Intelligent computing for sustainable energy and environment: Part III
Analysis on muscle activities of different movement patterns on an unstable platform using association rule mining

MMACTEE'09 Proceedings of the 11th WSEAS international conference on Mathematical methods and computational techniques in electrical engineering
A highly parallel algorithm for frequent itemset mining

MCPR'10 Proceedings of the 2nd Mexican conference on Pattern recognition: Advances in pattern recognition
CLAP: Collaborative pattern mining for distributed information systems

Decision Support Systems
Mining interesting XML-enabled association rules with templates

KDID'04 Proceedings of the Third international conference on Knowledge Discovery in Inductive Databases
Hardware enhanced mining for association rules

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Temporal data mining for smart homes

Designing Smart Homes
An efficient algorithm for distributed incremental updating of frequent item-sets on massive database

WISE'06 Proceedings of the 7th international conference on Web Information Systems
Frequent itemset minning with trie data structure and parallel execution with PVM

PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Parallel approaches to machine learning-A comprehensive survey

Journal of Parallel and Distributed Computing
A decentralized approach for mining event correlations in distributed system monitoring

Journal of Parallel and Distributed Computing
Peer-to-peer data mining classifiers for decentralized detection of network attacks

ADC '13 Proceedings of the Twenty-Fourth Australasian Database Conference - Volume 137

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the important problems in data mining is discovering association rules from databases of transactions where each transaction consists of a set of items. The most time consuming operation in this discovery process is the computation of the frequency of the occurrences of interesting subset of items (called candidates) in the database of transactions. To prune the exponentially large space of candidates, most existing algorithms, consider only those candidates that have a user defined minimum support. Even with the pruning, the task of finding all association rules requires a lot of computation power and time. Parallel computers offer a potential solution to the computation requirement of this task, provided efficient and scalable parallel algorithms can be designed. In this paper, we present two new parallel algorithms for mining association rules. The Intelligent Data Distribution algorithm efficiently uses aggregate memory of the parallel computer by employing intelligent candidate partitioning scheme and uses efficient communication mechanism to move data among the processors. The Hybrid Distribution algorithm further improves upon the Intelligent Data Distribution algorithm by dynamically partitioning the candidate set to maintain good load balance. The experimental results on a Cray T3D parallel computer show that the Hybrid Distribution algorithm scales linearly and exploits the aggregate memory better and can generate more association rules with a single scan of database per pass.