Efficient mining of emerging patterns: discovering trends and differences
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Parallel data mining for association rules on shared memory systems
Knowledge and Information Systems
Detecting Group Differences: Mining Contrast Sets
Data Mining and Knowledge Discovery
Parallel and Distributed Association Mining: A Survey
IEEE Concurrency
Parallel Leap: Large-Scale Maximal Pattern Mining in a Distributed Environment
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Frequent pattern mining: current status and future directions
Data Mining and Knowledge Discovery
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
A survey on algorithms for mining frequent itemsets over data streams
Knowledge and Information Systems
Pfp: parallel fp-growth for query recommendation
Proceedings of the 2008 ACM conference on Recommender systems
Weighted random sampling with a reservoir
Information Processing Letters
Direct local pattern sampling by efficient two-step random procedures
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Linear space direct pattern sampling using coupling from the past
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
Frequent itemset mining is a well studied and important problem in the datamining community. An abundance of different mining algorithms exists, all with different flavor and characteristics, but almost all suffer from two major shortcomings. First, in general frequent itemset mining algorithms perform exhaustive search over a huge pattern space. Second, most algorithms assume that the input data fits into main memory. The first problem was recently tackled in the work of [2], by direct sampling the required number of patterns over the pattern space. This paper extends the direct sampling approach by casting the algorithm into the MapReduce framework, effectively ceasing the memory requirements that the data should fit into main memory. The results show that the algorithm scales well for large data sets, while the memory requirements are solely dependent on the required number of patterns in the output.