Apriori-based frequent itemset mining algorithms on MapReduce

Authors:
Ming-Yen Lin;Pei-Yu Lee;Sue-Chen Hsueh
Affiliations:
Feng Chia University, Taichung, Taiwan;Feng Chia University, Taichung, Taiwan;Chaoyang University of Technology, Taichung, Taiwan
Venue:
Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
Year:
2012

Citing 22
Cited 1

Fast sequential and parallel algorithms for association rule mining: a comparison

Fast sequential and parallel algorithms for association rule mining: a comparison
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Hash based parallel algorithms for mining association rules

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Real world performance of association rule algorithms

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Parallel and Distributed Association Mining: A Survey

IEEE Concurrency
Parallel Mining of Association Rules

IEEE Transactions on Knowledge and Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A sampling-based framework for parallel data mining

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Pfp: parallel fp-growth for query recommendation

Proceedings of the 2008 ACM conference on Recommender systems
A comparison of approaches to large-scale data analysis

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
SQL in the Clouds

Computing in Science and Engineering
MapReduce and parallel DBMSs: friends or foes?

Communications of the ACM - Amir Pnueli: Ahead of His Time
MapReduce: a flexible data processing tool

Communications of the ACM - Amir Pnueli: Ahead of His Time
A load-balanced distributed parallel mining algorithm

Expert Systems with Applications: An International Journal
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads

Proceedings of the VLDB Endowment
Efficient parallel set-similarity joins using MapReduce

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
The Hadoop Distributed File System

MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
The performance of MapReduce: an in-depth study

Proceedings of the VLDB Endowment
Batch text similarity search with MapReduce

APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
DPSP: distributed progressive sequential pattern mining on the cloud

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
MapReduce indexing strategies: Studying scalability and efficiency

Information Processing and Management: an International Journal

Efficient mining of frequent itemsets in social network data based on MapReduce framework

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many parallelization techniques have been proposed to enhance the performance of the Apriori-like frequent itemset mining algorithms. Characterized by both map and reduce functions, MapReduce has emerged and excels in the mining of datasets of terabyte scale or larger in either homogeneous or heterogeneous clusters. Minimizing the scheduling overhead of each map-reduce phase and maximizing the utilization of nodes in each phase are keys to successful MapReduce implementations. In this paper, we propose three algorithms, named SPC, FPC, and DPC, to investigate effective implementations of the Apriori algorithm in the MapReduce framework. DPC features in dynamically combining candidates of various lengths and outperforms both the straight-forward algorithm SPC and the fixed passes combined counting algorithm FPC. Extensive experimental results also show that all the three algorithms scale up linearly with respect to dataset sizes and cluster sizes.