Efficient parallel data mining for association rules
CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
A Parallel Distributive Join Algorithm for Cube-Connected Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Dynamic itemset counting and implication rules for market basket data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Efficiently mining long patterns from databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Clustering transactions using large items
Proceedings of the eighth international conference on Information and knowledge management
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Generating non-redundant association rules
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Depth first generation of long patterns
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
A fast distributed algorithm for mining association rules
DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Multipass algorithms for mining association rules in text databases
Knowledge and Information Systems
MPI: The Complete Reference
Parallel Algorithms for Discovery of Association Rules
Data Mining and Knowledge Discovery
Mining association rules using inverted hashing and pruning
Information Processing Letters
Parallel Mining of Association Rules
IEEE Transactions on Knowledge and Data Engineering
Using a Hash-Based Method with Transaction Trimming for Mining Association Rules
IEEE Transactions on Knowledge and Data Engineering
Scalable Parallel Data Mining for Association Rules
IEEE Transactions on Knowledge and Data Engineering
Scalable Algorithms for Association Mining
IEEE Transactions on Knowledge and Data Engineering
Effect of Data Skewness and Workload Balance in Parallel Data Mining
IEEE Transactions on Knowledge and Data Engineering
Pincer-Search: An Efficient Algorithm for Discovering the Maximum Frequent Set
IEEE Transactions on Knowledge and Data Engineering
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases
Proceedings of the 17th International Conference on Data Engineering
Efficiently Mining Maximal Frequent Itemsets
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
CLOPE: a fast and effective clustering algorithm for transactional data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
ROCK: A Robust Clustering Algorithm for Categorical Attributes
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
New Algorithms for Fast Discovery of Association Rules
New Algorithms for Fast Discovery of Association Rules
Performance study of distributed Apriori-like frequent itemsets mining
Knowledge and Information Systems
Mining fuzzy association rules from uncertain data
Knowledge and Information Systems
A parallel algorithm for computing borders
Proceedings of the 20th ACM international conference on Information and knowledge management
Hi-index | 0.00 |
In this paper, we propose two parallel algorithms for mining maximal frequent itemsets from databases. A frequent itemset is maximal if none of its supersets is frequent. One parallel algorithm is named distributed max-miner (DMM), and it requires very low communication and synchronization overhead in distributed computing systems. DMM has the local mining phase and the global mining phase. During the local mining phase, each node mines the local database to discover the local maximal frequent itemsets, then they form a set of maximal candidate itemsets for the top-down search in the subsequent global mining phase. A new prefix tree data structure is developed to facilitate the storage and counting of the global candidate itemsets of different sizes. This global mining phase using the prefix tree can work with any local mining algorithm. Another parallel algorithm, named parallel max-miner (PMM), is a parallel version of the sequential max-miner algorithm (Proc of ACM SIGMOD Int Conf on Management of Data, 1998, pp 85–93). Most of existing mining algorithms discover the frequent k-itemsets on the kth pass over the databases, and then generate the candidate (k + 1)-itemsets for the next pass. Compared to those level-wise algorithms, PMM looks ahead at each pass and prunes more candidate itemsets by checking the frequencies of their supersets. Both DMM and PMM were implemented on a cluster of workstations, and their performance was evaluated for various cases. They demonstrate very good performance and scalability even when there are large maximal frequent itemsets (i.e., long patterns) in databases.