Parallel Mining of Association Rules
IEEE Transactions on Knowledge and Data Engineering
H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach
Data Mining and Knowledge Discovery
A sampling-based framework for parallel data mining
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Toward terabyte pattern mining: an architecture-conscious solution
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Pfp: parallel fp-growth for query recommendation
Proceedings of the 2008 ACM conference on Recommender systems
Parallel FP-growth on PC cluster
PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
SOA with .NET
Hi-index | 0.00 |
Frequent itemset mining finds frequently occurring itemsets in transactional data. This is applied to diverse problems such as decision support, selective marketing, financial forecast and medical diagnosis. The cloud, computation as an utility service, allows us to crunch large mining problems. There are a number of algorithms for doing frequent itemset mining, but none are out-of-the-box suited for the cloud, requiring large data structures to be synchronized across the network. One of the best algorithms for doing frequent itemset mining is the known FP-growth (Frequent Patterns growth). We develop a cloud-enabled algorithmic variant for frequent itemset mining that scales with very little communication and computational overhead and even, with only one worker node, is faster than FP-growth. We develop the concept of a postfix path and show how this allows us to lower the communicational cost and leads to adjustable work sizes. This concept provides a very flexible algorithmic solution that can be applied to a wide variety of different problem sizes and setups.