Models for association rules based on clustering and correlation

Authors:
Carlos Ordonez
Affiliations:
Department of Computer Science, University of Houston, 4800 Calhoun Rd, Houston, TX 77204, USA. E-mail: ordonez@cs.uh.edu
Venue:
Intelligent Data Analysis
Year:
2009

Citing 39
Cited 5

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
A unifying review of linear Gaussian models

Neural Computation
Mining the most interesting rules

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering transactions using large items

Proceedings of the eighth international conference on Information and knowledge management
Transversing itemset lattices with statistical metric pruning

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A condensed representation to find frequent patterns

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On feature distributional clustering for text categorization

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient discovery of error-tolerant frequent itemsets in high dimensions

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Machine Learning

Machine Learning
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
Performance Evaluation of Some Clustering Algorithms and Validity Indices

IEEE Transactions on Pattern Analysis and Machine Intelligence
Clustering Association Rules

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
A Tight Upper Bound on the Number of Candidate Patterns

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Efficiently Mining Approximate Models of Associations in Evolving Databases

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Mining Generalized Association Rules

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Pushing Support Constraints Into Association Rules Mining

IEEE Transactions on Knowledge and Data Engineering
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Mining Bases for Association Rules Using Closed Sets

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Clustering binary data streams with K-means

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Reducing borders of k-disjunction free representations of frequent patterns

Proceedings of the 2004 ACM symposium on Applied computing
Efficient Disk-Based K-Means Clustering for Relational Databases

IEEE Transactions on Knowledge and Data Engineering
The complexity of mining maximal frequent itemsets and maximal frequent patterns

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Accelerating EM clustering to find high-quality solutions

Knowledge and Information Systems
A model for association rules based on clustering

Proceedings of the 2005 ACM symposium on Applied computing
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Integrating K-Means Clustering with a Relational DBMS Using SQL

IEEE Transactions on Knowledge and Data Engineering
TAPER: A Two-Step Approach for All-Strong-Pairs Correlation Query in Large Databases

IEEE Transactions on Knowledge and Data Engineering
Finding association rules that trade support optimally against confidence

Intelligent Data Analysis
Learning quantifiable associations via principal sparse non-negative matrix factorization

Intelligent Data Analysis
Mining association rules using clustering

Intelligent Data Analysis
Approximate mining of frequent patterns on streams

Intelligent Data Analysis - Knowlegde Discovery from Data Streams
An efficient incremental mining algorithm-QSD

Intelligent Data Analysis
New probabilistic interest measures for association rules

Intelligent Data Analysis

Fast UDFs to compute sufficient statistics on large data sets exploiting caching and sampling

Data & Knowledge Engineering
Keyword search across databases and documents

Proceedings of the 2nd International Workshop on Keyword Search on Structured Data
Efficient algorithms based on relational queries to mine frequent graphs

PIKM '10 Proceedings of the 3rd workshop on Ph.D. students in information and knowledge management
Evaluating association rules and decision trees to predict multiple target attributes

Intelligent Data Analysis
Use of schema associative mapping for synchronization of the virtual machine audit logs

CISIS'11 Proceedings of the 4th international conference on Computational intelligence in security for information systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Association rules require models to understand their relationship to statistical properties of the data set. In this work, we study mathematical relationships between association rules and two fundamental techniques: clustering and correlation. Each cluster represents an important itemset. We show the sufficient statistics for clustering and correlation on binary data sets are the linear sum of points and the quadratic sum of points, respectively. We prove itemset support can be bounded and approximated from both models. Support bounds and support estimation obey the set downward closure property for fast bottom-up search for frequent itemsets. Both models can be efficiently computed with sparse matrix computations. Experiments with real and synthetic data sets evaluate model accuracy and speed. The clustering model is accurate to estimate support, given a sufficiently large number of clusters and it is more accurate than correlation, except for sets of two items. Accuracy increases as the number of clusters grows, but decreases as the minimum support threshold decreases. Once built, the clustering model represents a faster alternative than the traditional A-priori algorithm and the correlation model to mine associations. The correlation model is faster to compute than clustering, but it is less accurate. Time complexity to compute both models is linear on data set size, whereas dimensionality marginally impacts time when analyzing large transaction data sets.