A probability analysis for candidate-based frequent itemset algorithms

  • Authors:
  • Nele Dexters;Paul W. Purdom;Dirk Van Gucht

  • Affiliations:
  • University of Antwerp, Antwerp, Belgium;Indiana University, Bloomington, Indiana;Indiana University, Bloomington, Indiana

  • Venue:
  • Proceedings of the 2006 ACM symposium on Applied computing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper explores the generation of candidates, which is an important step in frequent itemset mining algorithms, from a theoretical point of view. Important notions in our probabilistic analysis are success (a candidate that is frequent), and failure (a candidate that is infrequent). For a selection of candidate-based frequent itemset mining algorithms, the probabilities of these events are studied for the shopping model where all the shoppers are independent and each combination of items has its own probability, so any correlation between items is possible. The Apriori Algorithm is considered in detail; for AIS, Eclat, FP-growth and the Fast Completion Apriori Algorithm, the main principles are sketched. The results of the analysis are used to compare the behaviour of the algorithms for a variety of data distributions.