Ambiguous frequent itemset mining and polynomial delay enumeration

Authors:
Takeaki Uno;Hiroki Arimura
Affiliations:
National Institute of Informatics, Tokyo, Japan;Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Japan
Venue:
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Year:
2008

Citing 10
Cited 1

Combinatorial pattern discovery for scientific data: some preliminary results

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Reverse search for enumeration

Discrete Applied Mathematics - Special volume: first international colloquium on graphs and optimization (GOI), 1992
Fast discovery of association rules

Advances in knowledge discovery and data mining
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Efficient discovery of error-tolerant frequent itemsets in high dimensions

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Dense itemsets

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Approximate Frequent Itemsets from Noisy Data

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
An efficient polynomial delay algorithm for pseudo frequent itemset mining

DS'07 Proceedings of the 10th international conference on Discovery science
An efficient algorithm for enumerating pseudo cliques

ISAAC'07 Proceedings of the 18th international conference on Algorithms and computation
Mining formal concepts with a bounded number of exceptions from transactional data

KDID'04 Proceedings of the Third international conference on Knowledge Discovery in Inductive Databases

A knowledge-driven bi-clustering method for mining noisy datasets

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part III

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mining frequently appearing patterns in a database is a basic problem in recent informatics, especially in data mining. Particularly, when the input database is a collection of subsets of an itemset, called transaction, the problem is called the frequent itemset mining problem, and it has been extensively studied. The items in a frequent itemset appear in many records simultaneously, thus they can be considered to be a cluster with respect to these records. However, in this sense, the condition that every item appears in each record is quite strong. We should allow for several missing items in these records. In this paper, we approach this problem from the algorithm theory, and consider the model that can be solved efficiently and possibly valuable in practice. We introduce ambiguous frequent itemsets which allow missing items in their occurrence records. More precisely, for given thresholds ? and s, an ambiguous frequent itemset P has a transaction set τ, | τ | ≥ σ, such that on average, transactions in τ include ratio θ of items of P. We formulate the problem of enumerating ambiguous frequent itemsets, and propose an efficient polynomial delay polynomial space algorithm. The practical performance is evaluated by computational experiments. Our algorithm can be naturally extended to the weighted version of the problem. The weighted version is a natural extension of the ordinary frequent itemset to weighted transaction databases, and is equivalent to finding submatrices with large average weights in their cells. An implementation is available at the author's homepage.