Efficient discovery of error-tolerant frequent itemsets in high dimensions

Authors:
Cheng Yang;Usama Fayyad;Paul S. Bradley
Affiliations:
Stanford University, Stanford, CA;digiMine, Inc., Bellevue, WA;digiMine, Inc., Bellevue, WA
Venue:
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2001

Citing 20
Cited 39

Concrete mathematics: a foundation for computer science

Concrete mathematics: a foundation for computer science
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
GroupLens: an open architecture for collaborative filtering of netnews

CSCW '94 Proceedings of the 1994 ACM conference on Computer supported cooperative work
Mining quantitative association rules in large relational tables

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Recommender systems

Communications of the ACM
Bayesian classification (AutoClass): theory and results

Advances in knowledge discovery and data mining
Fast discovery of association rules

Advances in knowledge discovery and data mining
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
CACTUS—clustering categorical data using summaries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Compressed data cubes for OLAP aggregate query approximation on continuous dimensions

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Density-based indexing for approximate nearest-neighbor queries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering All Most Specific Sentences by Randomized Algorithms

ICDT '97 Proceedings of the 6th International Conference on Database Theory
Clustering Categorical Data: An Approach Based on Dynamical Systems

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Experiences with GroupLens: marking usenet useful again

ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference

Distributed data mining in a chain store database of short transactions

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Support envelopes: a technique for exploring the structure of association patterns

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Dense itemsets

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Generalizing the notion of support

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Sliding window filtering: an efficient method for incremental mining on a time-variant database.

Information Systems
Summarizing itemset patterns: a profile-based approach

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Generalizing the Notion of Confidence

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Mining Approximate Frequent Itemsets from Noisy Data

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Fast Discovery and the Generalization of Strong Jumping Emerging Patterns for Building Compact and Accurate Classifiers

IEEE Transactions on Knowledge and Data Engineering
Twain: Two-end association miner with precise frequent exhibition periods

ACM Transactions on Knowledge Discovery from Data (TKDD)
From frequent itemsets to semantically meaningful visual patterns

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Generalizing the notion of confidence

Knowledge and Information Systems
Efficient mining of understandable patterns from multivariate interval time series

Data Mining and Knowledge Discovery
Discovering frequent itemsets by support approximation and itemset clustering

Data & Knowledge Engineering
Mining fault-tolerant frequent patterns efficiently with powerful pruning

Proceedings of the 2008 ACM symposium on Applied computing
Efficient algorithms for incremental Web log mining with dynamic thresholds

The VLDB Journal — The International Journal on Very Large Data Bases
Quantitative evaluation of approximate frequent pattern mining algorithms

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
MINI: Mining Informative Non-redundant Itemsets

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Blind paraunitary equalization

Signal Processing
Application-Independent Feature Construction from Noisy Samples

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Association Analysis Techniques for Bioinformatics Problems

BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
Models for association rules based on clustering and correlation

Intelligent Data Analysis
Towards efficient mining of proportional fault-tolerant frequent itemsets

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Sweeping the disjunctive search space towards mining new exact concise representations of frequent itemsets

Data & Knowledge Engineering
Proportional fault-tolerant data mining with applications to bioinformatics

Information Systems Frontiers
Progressive weighted miner: an efficient method for time-constraint mining

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
An efficient polynomial delay algorithm for pseudo frequent itemset mining

DS'07 Proceedings of the 10th international conference on Discovery science
Ambiguous frequent itemset mining and polynomial delay enumeration

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Actionability and formal concepts: a data mining perspective

ICFCA'08 Proceedings of the 6th international conference on Formal concept analysis
Class description using partial coverage of subspaces

Expert Systems with Applications: An International Journal
Mining fault-tolerant item sets using subset size occurrence distributions

IDA'11 Proceedings of the 10th international conference on Advances in intelligent data analysis X
Discretization of multidimensional web data for informative dense regions discovery

CIS'04 Proceedings of the First international conference on Computational and Information Science
Finding trees from unordered 0–1 data

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Summarizing frequent patterns using profiles

DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
Significance and recovery of block structures in binary matrices with noise

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Mining and validation of localized frequent web access patterns with dynamic tolerance

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Mining a new fault-tolerant pattern type as an alternative to formal concept discovery

ICCS'06 Proceedings of the 14th international conference on Conceptual Structures: inspiration and Application
Constraint-Based mining of fault-tolerant patterns from boolean data

KDID'05 Proceedings of the 4th international conference on Knowledge Discovery in Inductive Databases
Frequent item set mining

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a generalization of frequent itemsets allowing for the notion of errors in the itemset definition. We motivate the problem and present an efficient algorithm that identifies error-tolerant frequent clusters of items in transactional data (customer-purchase data, web browsing data, text, etc.). The algorithm exploits sparseness of the underlying data to find large groups of items that are correlated over database records (rows). The notion of transaction coverage allows us to extend the algorithm and view it as a fast clustering algorithm for discovering segments of similar transactions in binary sparse data. We evaluate the new algorithm on three real-world applications: clustering high-dimensional data, query selectivity estimation and collaborative filtering. Results show that the algorithm consistently uncovers structure in large sparse databases that other traditional clustering algorithms fail to find.