Direct out-of-memory distributed parallel frequent pattern mining

Authors:
Zheyi Rong;Jeroen De Knijf
Affiliations:
TU Eindhoven, the Netherlands;TU Eindhoven, the Netherlands
Venue:
Proceedings of the 2nd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Year:
2013

Citing 13
Cited 0

Efficient mining of emerging patterns: discovering trends and differences

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Parallel data mining for association rules on shared memory systems

Knowledge and Information Systems
Detecting Group Differences: Mining Contrast Sets

Data Mining and Knowledge Discovery
Parallel and Distributed Association Mining: A Survey

IEEE Concurrency
Parallel Leap: Large-Scale Maximal Pattern Mining in a Distributed Environment

ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Frequent pattern mining: current status and future directions

Data Mining and Knowledge Discovery
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
A survey on algorithms for mining frequent itemsets over data streams

Knowledge and Information Systems
Pfp: parallel fp-growth for query recommendation

Proceedings of the 2008 ACM conference on Recommender systems
Weighted random sampling with a reservoir

Information Processing Letters
Direct local pattern sampling by efficient two-step random procedures

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Linear space direct pattern sampling using coupling from the past

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Frequent itemset mining is a well studied and important problem in the datamining community. An abundance of different mining algorithms exists, all with different flavor and characteristics, but almost all suffer from two major shortcomings. First, in general frequent itemset mining algorithms perform exhaustive search over a huge pattern space. Second, most algorithms assume that the input data fits into main memory. The first problem was recently tackled in the work of [2], by direct sampling the required number of patterns over the pattern space. This paper extends the direct sampling approach by casting the algorithm into the MapReduce framework, effectively ceasing the memory requirements that the data should fit into main memory. The results show that the algorithm scales well for large data sets, while the memory requirements are solely dependent on the required number of patterns in the output.