Bounding and Estimating Association Rule Support from Clusters on Binary Data

Authors:
Carlos Ordonez;Kai Zhao;Zhibo Chen
Affiliations:
-;-;-
Venue:
ICDMW '08 Proceedings of the 2008 IEEE International Conference on Data Mining Workshops
Year:
2008

Citing 0
Cited 1

Efficient algorithms based on relational queries to mine frequent graphs

PIKM '10 Proceedings of the 3rd workshop on Ph.D. students in information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

The theoretical relationship between association rules and machine learning techniques needs to be studied in more depth. This article studies the use of clustering as a model for association rule mining. The clustering model is exploited to bound and estimate association rule support and confidence. We first study the efficient computation of the clustering model with K-means; we show the sufficient statistics for clustering on binary data sets is the linear sum of points. We then prove itemset support can be bounded and estimated from the model. Finally, we show support bounds fulfill the set downward closure property. Experiments study model accuracy and algorithm speed, paying particular attention to error behavior in support estimation. Given a sufficiently large number of clusters, the model becomes fairly accurate to approximate support. However, as the minimum support threshold decreases accuracy also decreases. The model is fairly accurate to discover a large fraction of frequent itemsets at different support levels. The model is compared against a traditional association rule algorithm to mine frequent itemsets, exhibiting better performance at low support levels. Time complexity to compute the binary cluster model is linear on data set size, whereas the dimensionality of transaction data sets has marginal impact on time.