Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
A unifying review of linear Gaussian models
Neural Computation
Mining the most interesting rules
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering transactions using large items
Proceedings of the eighth international conference on Information and knowledge management
Transversing itemset lattices with statistical metric pruning
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Finding generalized projected clusters in high dimensional spaces
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A condensed representation to find frequent patterns
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On feature distributional clustering for text categorization
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient discovery of error-tolerant frequent itemsets in high dimensions
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Machine Learning
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values
Data Mining and Knowledge Discovery
Performance Evaluation of Some Clustering Algorithms and Validity Indices
IEEE Transactions on Pattern Analysis and Machine Intelligence
ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
A Tight Upper Bound on the Number of Candidate Patterns
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Efficiently Mining Approximate Models of Associations in Evolving Databases
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Mining Generalized Association Rules
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Pushing Support Constraints Into Association Rules Mining
IEEE Transactions on Knowledge and Data Engineering
ROCK: A Robust Clustering Algorithm for Categorical Attributes
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Mining Bases for Association Rules Using Closed Sets
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Clustering binary data streams with K-means
DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Reducing borders of k-disjunction free representations of frequent patterns
Proceedings of the 2004 ACM symposium on Applied computing
Efficient Disk-Based K-Means Clustering for Relational Databases
IEEE Transactions on Knowledge and Data Engineering
The complexity of mining maximal frequent itemsets and maximal frequent patterns
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Accelerating EM clustering to find high-quality solutions
Knowledge and Information Systems
A model for association rules based on clustering
Proceedings of the 2005 ACM symposium on Applied computing
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Integrating K-Means Clustering with a Relational DBMS Using SQL
IEEE Transactions on Knowledge and Data Engineering
TAPER: A Two-Step Approach for All-Strong-Pairs Correlation Query in Large Databases
IEEE Transactions on Knowledge and Data Engineering
Finding association rules that trade support optimally against confidence
Intelligent Data Analysis
Learning quantifiable associations via principal sparse non-negative matrix factorization
Intelligent Data Analysis
Mining association rules using clustering
Intelligent Data Analysis
Approximate mining of frequent patterns on streams
Intelligent Data Analysis - Knowlegde Discovery from Data Streams
An efficient incremental mining algorithm-QSD
Intelligent Data Analysis
New probabilistic interest measures for association rules
Intelligent Data Analysis
Fast UDFs to compute sufficient statistics on large data sets exploiting caching and sampling
Data & Knowledge Engineering
Keyword search across databases and documents
Proceedings of the 2nd International Workshop on Keyword Search on Structured Data
Efficient algorithms based on relational queries to mine frequent graphs
PIKM '10 Proceedings of the 3rd workshop on Ph.D. students in information and knowledge management
Evaluating association rules and decision trees to predict multiple target attributes
Intelligent Data Analysis
Use of schema associative mapping for synchronization of the virtual machine audit logs
CISIS'11 Proceedings of the 4th international conference on Computational intelligence in security for information systems
Hi-index | 0.00 |
Association rules require models to understand their relationship to statistical properties of the data set. In this work, we study mathematical relationships between association rules and two fundamental techniques: clustering and correlation. Each cluster represents an important itemset. We show the sufficient statistics for clustering and correlation on binary data sets are the linear sum of points and the quadratic sum of points, respectively. We prove itemset support can be bounded and approximated from both models. Support bounds and support estimation obey the set downward closure property for fast bottom-up search for frequent itemsets. Both models can be efficiently computed with sparse matrix computations. Experiments with real and synthetic data sets evaluate model accuracy and speed. The clustering model is accurate to estimate support, given a sufficiently large number of clusters and it is more accurate than correlation, except for sets of two items. Accuracy increases as the number of clusters grows, but decreases as the minimum support threshold decreases. Once built, the clustering model represents a faster alternative than the traditional A-priori algorithm and the correlation model to mine associations. The correlation model is faster to compute than clustering, but it is less accurate. Time complexity to compute both models is linear on data set size, whereas dimensionality marginally impacts time when analyzing large transaction data sets.