TOP-COP: Mining TOP-K Strongly Correlated Pairs in Large Databases

Authors:
Hui Xiong;Mark Brodie;Sheng Ma
Affiliations:
Rutgers University, USA;IBM TJ Watson;Vivido Media Inc.
Venue:
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Year:
2006

Citing 0
Cited 4

Top-K Correlation Sub-graph Search in Graph Databases

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Mining correlated subgraphs in graph databases

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Scaling up top-K cosine similarity search

Data & Knowledge Engineering
Correlation range query

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management

Quantified Score

Hi-index	0.01

Visualization

Abstract

Recently, there has been considerable interest in computing strongly correlated pairs in large databases. Most previous studies require the specification of a minimum correlation threshold to perform the computation. However, it may be difficult for users to provide an appropriate threshold in practice, since different data sets typically have different characteristics. To this end, we propose an alternative task: mining the top-k strongly correlated pairs. In this paper, we identify a 2-D monotone property of an upper bound of Pearson's correlation coefficient and develop an efficient algorithm, called TOP-COP to exploit this property to effectively prune many pairs even without computing their correlation coefficients. Our experimental results show that the TOP-COP algorithm can be orders of magnitude faster than brute-force alternatives for mining the top-k strongly correlated pairs.