Message-driven FP-growth

Authors:
Jan Neerbek
Affiliations:
Alexandra Institute, Aarhus, Denmark
Venue:
Proceedings of the WICSA/ECSA 2012 Companion Volume
Year:
2012

Citing 9
Cited 0

Parallel Mining of Association Rules

IEEE Transactions on Knowledge and Data Engineering
H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

Data Mining and Knowledge Discovery
A sampling-based framework for parallel data mining

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Toward terabyte pattern mining: an architecture-conscious solution

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Pfp: parallel fp-growth for query recommendation

Proceedings of the 2008 ACM conference on Recommender systems
Parallel FP-growth on PC cluster

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
SOA with .NET

SOA with .NET

Quantified Score

Hi-index	0.00

Visualization

Abstract

Frequent itemset mining finds frequently occurring itemsets in transactional data. This is applied to diverse problems such as decision support, selective marketing, financial forecast and medical diagnosis. The cloud, computation as an utility service, allows us to crunch large mining problems. There are a number of algorithms for doing frequent itemset mining, but none are out-of-the-box suited for the cloud, requiring large data structures to be synchronized across the network. One of the best algorithms for doing frequent itemset mining is the known FP-growth (Frequent Patterns growth). We develop a cloud-enabled algorithmic variant for frequent itemset mining that scales with very little communication and computational overhead and even, with only one worker node, is faster than FP-growth. We develop the concept of a postfix path and show how this allows us to lower the communicational cost and leads to adjustable work sizes. This concept provides a very flexible algorithmic solution that can be applied to a wide variety of different problem sizes and setups.