Distributed Mining of Maximal Frequent Itemsets on a Data Grid System

Authors:
Congnan Luo;Anil L. Pereira;Soon M. Chung
Affiliations:
Department of Computer Science and Engineering, Wright State University, Dayton, USA 45435;Department of Computer Science and Engineering, Wright State University, Dayton, USA 45435;Department of Computer Science and Engineering, Wright State University, Dayton, USA 45435
Venue:
The Journal of Supercomputing
Year:
2006

Citing 19
Cited 0

A Parallel Distributive Join Algorithm for Cube-Connected Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The Globus toolkit

The grid
Depth first generation of long patterns

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Parallel Algorithms for Discovery of Association Rules

Data Mining and Knowledge Discovery
A National-Scale Authentication Infrastructure

Computer
Grid Services for Distributed System Integration

Computer
Parallel Mining of Association Rules

IEEE Transactions on Knowledge and Data Engineering
Scalable Parallel Data Mining for Association Rules

IEEE Transactions on Knowledge and Data Engineering
Scalable Algorithms for Association Mining

IEEE Transactions on Knowledge and Data Engineering
Effect of Data Skewness and Workload Balance in Parallel Data Mining

IEEE Transactions on Knowledge and Data Engineering
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases

Proceedings of the 17th International Conference on Data Engineering
Efficiently Mining Maximal Frequent Itemsets

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A Resource Management Architecture for Metacomputing Systems

IPPS/SPDP '98 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
MPICH-G2: a Grid-enabled implementation of the Message Passing Interface

Journal of Parallel and Distributed Computing - Special issue on computational grids
New Algorithms for Fast Discovery of Association Rules

New Algorithms for Fast Discovery of Association Rules
Data integration in a bandwidth-rich world

Communications of the ACM - Blueprint for the future of high-performance networking
The Anatomy of the Grid: Enabling Scalable Virtual Organizations

International Journal of High Performance Computing Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a new algorithm, named Grid-based Distributed Max-Miner (GridDMM), for mining maximal frequent itemsets from databases on a Data Grid. A frequent itemset is maximal if none of its supersets is frequent. GridDMM is specifically suitable for use in Grid environments due to low communication and synchronization overhead. GridDMM consists of a local mining phase and a global mining phase. During the local mining phase, each node mines the local database to discover the local maximal frequent itemsets, then they form a set of maximal candidate itemsets for the top-down search in the subsequent global mining phase. A new prefix-tree data structure is developed to facilitate the storage and counting of the global candidate itemsets of different sizes. We built a Data Grid system on a cluster of workstations using the open-source Globus Toolkit, and evaluated the GridDMM algorithm in terms of performance, scalability, and the overhead of communication and synchronization. GridDMM demonstrates better performance than other sequential and parallel algorithms, and its performance is scalable in terms of the database size and the number of nodes.