Speeding up correlation search for binary data

Authors:
Lian Duan;W. Nick Street;Yanchi Liu
Affiliations:
-;-;-
Venue:
Pattern Recognition Letters
Year:
2013

Citing 10
Cited 0

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Selecting the right objective measure for association analysis

Information Systems - Knowledge discovery and data mining (KDD 2002)
TAPER: A Two-Step Approach for All-Strong-Pairs Correlation Query in Large Databases

IEEE Transactions on Knowledge and Data Engineering
Interestingness measures for data mining: A survey

ACM Computing Surveys (CSUR)
Frequent closed itemset based algorithms: a thorough structural and analytical survey

ACM SIGKDD Explorations Newsletter
On privacy preservation against adversarial data mining

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Finding highly correlated pairs efficiently with powerful pruning

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Discovering Significant Patterns

Machine Learning
Finding Maximal Fully-Correlated Itemsets in Large Databases

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Itemset mining: A constraint programming perspective

Artificial Intelligence

Quantified Score

Hi-index	0.10

Visualization

Abstract

Searching correlated pairs in a collection of items is essential for many problems in commercial, medical, and scientific domains. Recently, a lot of progress has been made to speed up the search for pairs that have a high Pearson correlation (@f-coefficient). However, @f-coefficient is not the only or the best correlation measure. In this paper, we aim at an alternative task: finding correlated pairs of any ''good'' correlation measure which satisfies the three widely-accepted correlation properties in Section 2.1. In this paper, we identify a 1-dimensional monotone property of the upper bound of any ''good'' correlation measure, and different 2-dimensional monotone properties for different types of correlation measures. We can either use the 2-dimensional search algorithm to retrieve correlated pairs above a certain threshold, or our new token-ring algorithm to find top-k correlated pairs to prune many pairs without computing their correlations. The experimental results show that our robust algorithm can efficiently search correlated pairs under different situations and is an order of magnitude faster than the brute-force method.