Efficient mining of maximal frequent itemsets from databases on a cluster of workstations

Authors:
Soon M. Chung;Congnan Luo
Affiliations:
Wright State University, Department of Computer Science and Engineering, Dayton, OH, USA;Wright State University, Department of Computer Science and Engineering, Dayton, OH, USA
Venue:
Knowledge and Information Systems
Year:
2008

Citing 26
Cited 3

Efficient parallel data mining for association rules

CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
A Parallel Distributive Join Algorithm for Cube-Connected Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Dynamic itemset counting and implication rules for market basket data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Clustering transactions using large items

Proceedings of the eighth international conference on Information and knowledge management
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Generating non-redundant association rules

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Depth first generation of long patterns

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
A fast distributed algorithm for mining association rules

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Multipass algorithms for mining association rules in text databases

Knowledge and Information Systems
MPI: The Complete Reference

MPI: The Complete Reference
Parallel Algorithms for Discovery of Association Rules

Data Mining and Knowledge Discovery
Mining association rules using inverted hashing and pruning

Information Processing Letters
Parallel Mining of Association Rules

IEEE Transactions on Knowledge and Data Engineering
Using a Hash-Based Method with Transaction Trimming for Mining Association Rules

IEEE Transactions on Knowledge and Data Engineering
Scalable Parallel Data Mining for Association Rules

IEEE Transactions on Knowledge and Data Engineering
Scalable Algorithms for Association Mining

IEEE Transactions on Knowledge and Data Engineering
Effect of Data Skewness and Workload Balance in Parallel Data Mining

IEEE Transactions on Knowledge and Data Engineering
Pincer-Search: An Efficient Algorithm for Discovering the Maximum Frequent Set

IEEE Transactions on Knowledge and Data Engineering
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases

Proceedings of the 17th International Conference on Data Engineering
Efficiently Mining Maximal Frequent Itemsets

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
CLOPE: a fast and effective clustering algorithm for transactional data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
New Algorithms for Fast Discovery of Association Rules

New Algorithms for Fast Discovery of Association Rules

Performance study of distributed Apriori-like frequent itemsets mining

Knowledge and Information Systems
Mining fuzzy association rules from uncertain data

Knowledge and Information Systems
A parallel algorithm for computing borders

Proceedings of the 20th ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose two parallel algorithms for mining maximal frequent itemsets from databases. A frequent itemset is maximal if none of its supersets is frequent. One parallel algorithm is named distributed max-miner (DMM), and it requires very low communication and synchronization overhead in distributed computing systems. DMM has the local mining phase and the global mining phase. During the local mining phase, each node mines the local database to discover the local maximal frequent itemsets, then they form a set of maximal candidate itemsets for the top-down search in the subsequent global mining phase. A new prefix tree data structure is developed to facilitate the storage and counting of the global candidate itemsets of different sizes. This global mining phase using the prefix tree can work with any local mining algorithm. Another parallel algorithm, named parallel max-miner (PMM), is a parallel version of the sequential max-miner algorithm (Proc of ACM SIGMOD Int Conf on Management of Data, 1998, pp 85–93). Most of existing mining algorithms discover the frequent k-itemsets on the kth pass over the databases, and then generate the candidate (k + 1)-itemsets for the next pass. Compared to those level-wise algorithms, PMM looks ahead at each pass and prunes more candidate itemsets by checking the frequencies of their supersets. Both DMM and PMM were implemented on a cluster of workstations, and their performance was evaluated for various cases. They demonstrate very good performance and scalability even when there are large maximal frequent itemsets (i.e., long patterns) in databases.