Concrete mathematics: a foundation for computer science
Concrete mathematics: a foundation for computer science
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
GroupLens: an open architecture for collaborative filtering of netnews
CSCW '94 Proceedings of the 1994 ACM conference on Computer supported cooperative work
Mining quantitative association rules in large relational tables
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Communications of the ACM
Bayesian classification (AutoClass): theory and results
Advances in knowledge discovery and data mining
Fast discovery of association rules
Advances in knowledge discovery and data mining
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Efficiently mining long patterns from databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
CACTUS—clustering categorical data using summaries
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Compressed data cubes for OLAP aggregate query approximation on continuous dimensions
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Density-based indexing for approximate nearest-neighbor queries
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering All Most Specific Sentences by Randomized Algorithms
ICDT '97 Proceedings of the 6th International Conference on Database Theory
Clustering Categorical Data: An Approach Based on Dynamical Systems
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
ROCK: A Robust Clustering Algorithm for Categorical Attributes
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Experiences with GroupLens: marking usenet useful again
ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
Distributed data mining in a chain store database of short transactions
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Support envelopes: a technique for exploring the structure of association patterns
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Generalizing the notion of support
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Summarizing itemset patterns: a profile-based approach
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Generalizing the Notion of Confidence
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Mining Approximate Frequent Itemsets from Noisy Data
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
IEEE Transactions on Knowledge and Data Engineering
Twain: Two-end association miner with precise frequent exhibition periods
ACM Transactions on Knowledge Discovery from Data (TKDD)
From frequent itemsets to semantically meaningful visual patterns
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Generalizing the notion of confidence
Knowledge and Information Systems
Efficient mining of understandable patterns from multivariate interval time series
Data Mining and Knowledge Discovery
Discovering frequent itemsets by support approximation and itemset clustering
Data & Knowledge Engineering
Mining fault-tolerant frequent patterns efficiently with powerful pruning
Proceedings of the 2008 ACM symposium on Applied computing
Efficient algorithms for incremental Web log mining with dynamic thresholds
The VLDB Journal — The International Journal on Very Large Data Bases
Quantitative evaluation of approximate frequent pattern mining algorithms
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
MINI: Mining Informative Non-redundant Itemsets
PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Blind paraunitary equalization
Signal Processing
Application-Independent Feature Construction from Noisy Samples
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Association Analysis Techniques for Bioinformatics Problems
BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
Models for association rules based on clustering and correlation
Intelligent Data Analysis
Towards efficient mining of proportional fault-tolerant frequent itemsets
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Data & Knowledge Engineering
Proportional fault-tolerant data mining with applications to bioinformatics
Information Systems Frontiers
Progressive weighted miner: an efficient method for time-constraint mining
PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
An efficient polynomial delay algorithm for pseudo frequent itemset mining
DS'07 Proceedings of the 10th international conference on Discovery science
Ambiguous frequent itemset mining and polynomial delay enumeration
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Actionability and formal concepts: a data mining perspective
ICFCA'08 Proceedings of the 6th international conference on Formal concept analysis
Class description using partial coverage of subspaces
Expert Systems with Applications: An International Journal
Mining fault-tolerant item sets using subset size occurrence distributions
IDA'11 Proceedings of the 10th international conference on Advances in intelligent data analysis X
Discretization of multidimensional web data for informative dense regions discovery
CIS'04 Proceedings of the First international conference on Computational and Information Science
Finding trees from unordered 0–1 data
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Summarizing frequent patterns using profiles
DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
Significance and recovery of block structures in binary matrices with noise
COLT'06 Proceedings of the 19th annual conference on Learning Theory
Mining and validation of localized frequent web access patterns with dynamic tolerance
ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Mining a new fault-tolerant pattern type as an alternative to formal concept discovery
ICCS'06 Proceedings of the 14th international conference on Conceptual Structures: inspiration and Application
Constraint-Based mining of fault-tolerant patterns from boolean data
KDID'05 Proceedings of the 4th international conference on Knowledge Discovery in Inductive Databases
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Hi-index | 0.00 |
We present a generalization of frequent itemsets allowing for the notion of errors in the itemset definition. We motivate the problem and present an efficient algorithm that identifies error-tolerant frequent clusters of items in transactional data (customer-purchase data, web browsing data, text, etc.). The algorithm exploits sparseness of the underlying data to find large groups of items that are correlated over database records (rows). The notion of transaction coverage allows us to extend the algorithm and view it as a fast clustering algorithm for discovering segments of similar transactions in binary sparse data. We evaluate the new algorithm on three real-world applications: clustering high-dimensional data, query selectivity estimation and collaborative filtering. Results show that the algorithm consistently uncovers structure in large sparse databases that other traditional clustering algorithms fail to find.