Mining Approximate Frequent Itemsets from Noisy Data

Authors:
Jinze Liu;Susan Paulsen;Wei Wang;Andrew Nobel;Jan Prins
Affiliations:
University of North Carolina at Chapel Hill;University of North Carolina at Chapel Hill;University of North Carolina at Chapel Hill;University of North Carolina at Chapel Hill;University of North Carolina at Chapel Hill
Venue:
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Year:
2005

Citing 5
Cited 10

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Fast discovery of association rules

Advances in knowledge discovery and data mining
Efficient discovery of error-tolerant frequent itemsets in high dimensions

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Support envelopes: a technique for exploring the structure of association patterns

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Dense itemsets

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining

Discovering frequent itemsets by support approximation and itemset clustering

Data & Knowledge Engineering
Index-BitTableFI: An improved algorithm for mining frequent itemsets

Knowledge-Based Systems
Quantitative evaluation of approximate frequent pattern mining algorithms

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
RAM: Randomized Approximate Graph Mining

SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Towards efficient mining of proportional fault-tolerant frequent itemsets

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Agglomerating local patterns hierarchically with ALPHA

Proceedings of the 18th ACM conference on Information and knowledge management
ABBA: adaptive bicluster-based approach to impute missing values in binary matrices

Proceedings of the 2010 ACM Symposium on Applied Computing
An efficient polynomial delay algorithm for pseudo frequent itemset mining

DS'07 Proceedings of the 10th international conference on Discovery science
Ambiguous frequent itemset mining and polynomial delay enumeration

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Significance and recovery of block structures in binary matrices with noise

COLT'06 Proceedings of the 19th annual conference on Learning Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

Frequent itemset mining is a popular and important first step in analyzing data sets across a broad range of applications. The traditional, "exact" approach for finding frequent itemsets requires that every item in the itemset occurs in each supporting transaction. However, real data is typically subject to noise, and in the presence of such noise, traditional itemset mining may fail to detect relevant itemsets, particularly those large itemsets that are more vulnerable to noise. In this paper we propose approximate frequent itemsets (AFI), as a noise-tolerant itemset model. In addition to the usual requirement for sufficiently many supporting transactions, the AFI model places constraints on the fraction of errors permitted in each item column and the fraction of errors permitted in a supporting transaction. Taken together, these constraints winnow out the approximate itemsets that exhibit systematic errors. In the context of a simple noise model, we demonstrate that AFI is better at recovering underlying data patterns, while identifying fewer spurious patterns than either the exact frequent itemset approach or the existing error tolerant itemset approach of Yang et al. [11].