Scalable Parallel Data Mining for Association Rules

Authors:
Eui-Hong (Sam) Han;George Karypis;Vipin Kumar
Affiliations:
-;-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2000

Citing 16
Cited 44

Combinatorial optimization: algorithms and complexity

Combinatorial optimization: algorithms and complexity
Introduction to parallel computing: design and analysis of algorithms

Introduction to parallel computing: design and analysis of algorithms
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Efficient parallel data mining for association rules

CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Scalable parallel data mining for association rules

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Hash based parallel algorithms for mining association rules

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Parallel Algorithms for Discovery of Association Rules

Data Mining and Knowledge Discovery
Efficient Mining of Association Rules in Distributed Databases

IEEE Transactions on Knowledge and Data Engineering
Parallel Mining of Association Rules

IEEE Transactions on Knowledge and Data Engineering
Set-Oriented Mining for Association Rules in Relational Databases

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
DBMS Research at a Crossroads: The Vienna Update

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Mining Generalized Association Rules

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
New Algorithms for Fast Discovery of Association Rules

New Algorithms for Fast Discovery of Association Rules

Communication-efficient distributed mining of association rules

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Mining fuzzy association rules for classification problems

Computers and Industrial Engineering
Parallel GA-Based Wrapper Feature Selection for Spectroscopic Data Mining

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
An efficient association mining implementation on clusters of SMP

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Implementation Issues in the Design of I/O Intensive Data Mining Applications on Clusters of Workstations

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
A Parallel Genetic Algorithm for Rule Mining

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Parallelism in Knowledge Discovery Techniques

PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
Enhancing the Apriori Algorithm for Frequent Set Counting

DaWaK '01 Proceedings of the Third International Conference on Data Warehousing and Knowledge Discovery
Parallel Tree Projection Algorithm for Sequence Mining

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Efficient Parallel Algorithms for Mining Associations

Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
References

Sourcebook of parallel computing
Association Rule Mining in Peer-to-Peer Systems

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Inverted matrix: efficient discovery of frequent items in large datasets in the context of interactive mining

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
CLOSET+: searching for the best strategies for mining frequent closed itemsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Communication-Efficient Distributed Mining of Association Rules

Data Mining and Knowledge Discovery
Frequent Pattern Mining on Message Passing Multiprocessor Systems

Distributed and Parallel Databases
Parallel tree-projection-based sequence mining algorithms

Parallel Computing
Shared Memory Parallelization of Data Mining Algorithms: Techniques, Programming Interface, and Performance

IEEE Transactions on Knowledge and Data Engineering
A high-performance distributed algorithm for mining association rules

Knowledge and Information Systems
Processor-embedded distributed smart disks for I/O-intensive workloads: architectures, performance models and evaluation

Journal of Parallel and Distributed Computing
Distributed approximate mining of frequent patterns

Proceedings of the 2005 ACM symposium on Applied computing
An Algorithm for In-Core Frequent Itemset Mining on Streaming Data

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Distributed smart disks for I/O-intensive workloads on switched interconnects

Future Generation Computer Systems - Parallel input/output management techniques (PIOMT) in cluster and grid computing
Distributed Mining of Maximal Frequent Itemsets on a Data Grid System

The Journal of Supercomputing
Dynamic Load Balancing for the Distributed Mining of Molecular Structures

IEEE Transactions on Parallel and Distributed Systems
Cache-conscious frequent pattern mining on modern and emerging processors

The VLDB Journal — The International Journal on Very Large Data Bases
Efficient pattern mining on shared memory systems: implications for chip multiprocessor architectures

Proceedings of the 2006 workshop on Memory system performance and correctness
Parallel mining of association rules from text databases

The Journal of Supercomputing
Efficient design of neural network tree using a new splitting criterion

Neurocomputing
Approximate mining of frequent patterns on streams

Intelligent Data Analysis - Knowlegde Discovery from Data Streams
A new approach for evaluating agility in supply chains using Fuzzy Association Rules Mining

Engineering Applications of Artificial Intelligence
Efficient mining of maximal frequent itemsets from databases on a cluster of workstations

Knowledge and Information Systems
New Classification Method Based on Support-Significant Association Rules Algorithm

ICIC '07 Proceedings of the 3rd International Conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence
Distributed smart disks for I/O-intensive workloads on switched interconnects

Future Generation Computer Systems - Parallel input/output management techniques (PIOMT) in cluster and grid computing
Design and evaluation of distributed smart disk architecture for I/O-intensive workloads

ICCS'03 Proceedings of the 2003 international conference on Computational science
High performance data mining

VECPAR'02 Proceedings of the 5th international conference on High performance computing for computational science
An efficient parallel and distributed algorithm for counting frequent sets

VECPAR'02 Proceedings of the 5th international conference on High performance computing for computational science
Compiler and middleware support for scalable data mining

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
A novel parallel algorithm for frequent pattern mining with privacy preserved in cloud computing environments

International Journal of Ad Hoc and Ubiquitous Computing
Improving the scalability of ILP-based multi-relational concept discovery system through parallelization

Knowledge-Based Systems
Finding closed itemsets in data streams

KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part II
An efficient distributed algorithm for mining association rules

ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Efficient algorithms for frequent pattern mining in many-task computing environments

Knowledge-Based Systems
Parallel data mining techniques on Graphics Processing Unit with Compute Unified Device Architecture (CUDA)

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose two new parallel formulations of the Apriori algorithm that is used for computing association rules. These new formulations, IDD and HD, address the shortcomings of two previously proposed parallel formulations CD and DD. Unlike the CD algorithm, the IDD algorithm partitions the candidate set intelligently among processors to efficiently parallelize the step of building the hash tree. The IDD algorithm also eliminates the redundant work inherent in DD, and requires substantially smaller communication overhead than DD. But IDD suffers from the added cost due to communication of transactions among processors. HD is a hybrid algorithm that combines the advantages of CD and DD. Experimental results on a 128-processor Cray T3E show that HD scales just as well as the CD algorithm with respect to the number of transactions, and scales as well as IDD with respect to increasing candidate set size.