Combinatorial pattern discovery for scientific data: some preliminary results
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Reverse search for enumeration
Discrete Applied Mathematics - Special volume: first international colloquium on graphs and optimization (GOI), 1992
Fast discovery of association rules
Advances in knowledge discovery and data mining
Efficiently mining long patterns from databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Efficient discovery of error-tolerant frequent itemsets in high dimensions
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Approximate Frequent Itemsets from Noisy Data
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
An efficient polynomial delay algorithm for pseudo frequent itemset mining
DS'07 Proceedings of the 10th international conference on Discovery science
An efficient algorithm for enumerating pseudo cliques
ISAAC'07 Proceedings of the 18th international conference on Algorithms and computation
Mining formal concepts with a bounded number of exceptions from transactional data
KDID'04 Proceedings of the Third international conference on Knowledge Discovery in Inductive Databases
A knowledge-driven bi-clustering method for mining noisy datasets
ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part III
Hi-index | 0.00 |
Mining frequently appearing patterns in a database is a basic problem in recent informatics, especially in data mining. Particularly, when the input database is a collection of subsets of an itemset, called transaction, the problem is called the frequent itemset mining problem, and it has been extensively studied. The items in a frequent itemset appear in many records simultaneously, thus they can be considered to be a cluster with respect to these records. However, in this sense, the condition that every item appears in each record is quite strong. We should allow for several missing items in these records. In this paper, we approach this problem from the algorithm theory, and consider the model that can be solved efficiently and possibly valuable in practice. We introduce ambiguous frequent itemsets which allow missing items in their occurrence records. More precisely, for given thresholds ? and s, an ambiguous frequent itemset P has a transaction set τ, | τ | ≥ σ, such that on average, transactions in τ include ratio θ of items of P. We formulate the problem of enumerating ambiguous frequent itemsets, and propose an efficient polynomial delay polynomial space algorithm. The practical performance is evaluated by computational experiments. Our algorithm can be naturally extended to the weighted version of the problem. The weighted version is a natural extension of the ordinary frequent itemset to weighted transaction databases, and is equivalent to finding submatrices with large average weights in their cells. An implementation is available at the author's homepage.