A Parallel Distributive Join Algorithm for Cube-Connected Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Efficiently mining long patterns from databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The grid
Depth first generation of long patterns
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Parallel Algorithms for Discovery of Association Rules
Data Mining and Knowledge Discovery
Parallel Mining of Association Rules
IEEE Transactions on Knowledge and Data Engineering
Scalable Parallel Data Mining for Association Rules
IEEE Transactions on Knowledge and Data Engineering
Scalable Algorithms for Association Mining
IEEE Transactions on Knowledge and Data Engineering
Effect of Data Skewness and Workload Balance in Parallel Data Mining
IEEE Transactions on Knowledge and Data Engineering
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases
Proceedings of the 17th International Conference on Data Engineering
Efficiently Mining Maximal Frequent Itemsets
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A Resource Management Architecture for Metacomputing Systems
IPPS/SPDP '98 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
MPICH-G2: a Grid-enabled implementation of the Message Passing Interface
Journal of Parallel and Distributed Computing - Special issue on computational grids
New Algorithms for Fast Discovery of Association Rules
New Algorithms for Fast Discovery of Association Rules
Data integration in a bandwidth-rich world
Communications of the ACM - Blueprint for the future of high-performance networking
The Anatomy of the Grid: Enabling Scalable Virtual Organizations
International Journal of High Performance Computing Applications
Hi-index | 0.00 |
In this paper, we propose a new algorithm, named Grid-based Distributed Max-Miner (GridDMM), for mining maximal frequent itemsets from databases on a Data Grid. A frequent itemset is maximal if none of its supersets is frequent. GridDMM is specifically suitable for use in Grid environments due to low communication and synchronization overhead. GridDMM consists of a local mining phase and a global mining phase. During the local mining phase, each node mines the local database to discover the local maximal frequent itemsets, then they form a set of maximal candidate itemsets for the top-down search in the subsequent global mining phase. A new prefix-tree data structure is developed to facilitate the storage and counting of the global candidate itemsets of different sizes. We built a Data Grid system on a cluster of workstations using the open-source Globus Toolkit, and evaluated the GridDMM algorithm in terms of performance, scalability, and the overhead of communication and synchronization. GridDMM demonstrates better performance than other sequential and parallel algorithms, and its performance is scalable in terms of the database size and the number of nodes.