An efficient polynomial delay algorithm for pseudo frequent itemset mining

  • Authors:
  • Takeaki Uno;Hiroki Arimura

  • Affiliations:
  • National Institute of Informatics, Tokyo, Japan;Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Japan

  • Venue:
  • DS'07 Proceedings of the 10th international conference on Discovery science
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Mining frequently appearing patterns in a database is a basicproblem in informatics, especially in data mining. Particularly, whenthe input database is a collection of subsets of an itemset, the problemis called the frequent itemset mining problem, and has been extensivelystudied. In the real-world use, one of difficulties of frequent itemset miningis that real-world data is often incorrect, or missing some parts. Itcauses that some records which should include a pattern do not have it.To deal with real-world problems, one can use an ambiguous inclusionrelation and find patterns which are mostly included in many records.However, computational difficulty have prevented such problems frombeing actively used in practice. In this paper, we use an alternative inclusionrelation in which we consider an itemset P to be included in anitemset T if at most k items of P are not included in T, i.e., |P\T| ≤ k.We address the problem of enumerating frequent itemsets under thisinclusion relation and propose an efficient polynomial delay polynomialspace algorithm. Moreover, To enable us to skip many small nonvaluable frequent itemsets, we propose an algorithm for directly enumerating frequentitemsets of a certain size.