An efficient polynomial delay algorithm for pseudo frequent itemset mining

Authors:
Takeaki Uno;Hiroki Arimura
Affiliations:
National Institute of Informatics, Tokyo, Japan;Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Japan
Venue:
DS'07 Proceedings of the 10th international conference on Discovery science
Year:
2007

Citing 8
Cited 2

Combinatorial pattern discovery for scientific data: some preliminary results

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Reverse search for enumeration

Discrete Applied Mathematics - Special volume: first international colloquium on graphs and optimization (GOI), 1992
Fast discovery of association rules

Advances in knowledge discovery and data mining
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Efficient discovery of error-tolerant frequent itemsets in high dimensions

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Dense itemsets

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Approximate Frequent Itemsets from Noisy Data

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Mining formal concepts with a bounded number of exceptions from transactional data

KDID'04 Proceedings of the Third international conference on Knowledge Discovery in Inductive Databases

Ambiguous frequent itemset mining and polynomial delay enumeration

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
A parameterizable enumeration algorithm for sequence mining

Theoretical Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mining frequently appearing patterns in a database is a basicproblem in informatics, especially in data mining. Particularly, whenthe input database is a collection of subsets of an itemset, the problemis called the frequent itemset mining problem, and has been extensivelystudied. In the real-world use, one of difficulties of frequent itemset miningis that real-world data is often incorrect, or missing some parts. Itcauses that some records which should include a pattern do not have it.To deal with real-world problems, one can use an ambiguous inclusionrelation and find patterns which are mostly included in many records.However, computational difficulty have prevented such problems frombeing actively used in practice. In this paper, we use an alternative inclusionrelation in which we consider an itemset P to be included in anitemset T if at most k items of P are not included in T, i.e., |P\T| ≤ k.We address the problem of enumerating frequent itemsets under thisinclusion relation and propose an efficient polynomial delay polynomialspace algorithm. Moreover, To enable us to skip many small nonvaluable frequent itemsets, we propose an algorithm for directly enumerating frequentitemsets of a certain size.