Message-driven FP-growth

  • Authors:
  • Jan Neerbek

  • Affiliations:
  • Alexandra Institute, Aarhus, Denmark

  • Venue:
  • Proceedings of the WICSA/ECSA 2012 Companion Volume
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Frequent itemset mining finds frequently occurring itemsets in transactional data. This is applied to diverse problems such as decision support, selective marketing, financial forecast and medical diagnosis. The cloud, computation as an utility service, allows us to crunch large mining problems. There are a number of algorithms for doing frequent itemset mining, but none are out-of-the-box suited for the cloud, requiring large data structures to be synchronized across the network. One of the best algorithms for doing frequent itemset mining is the known FP-growth (Frequent Patterns growth). We develop a cloud-enabled algorithmic variant for frequent itemset mining that scales with very little communication and computational overhead and even, with only one worker node, is faster than FP-growth. We develop the concept of a postfix path and show how this allows us to lower the communicational cost and leads to adjustable work sizes. This concept provides a very flexible algorithmic solution that can be applied to a wide variety of different problem sizes and setups.