Correlation range query

Authors:
Wenjun Zhou;Hao Zhang
Affiliations:
Statistics, Operations, and Management Science Department, University of Tennessee, Knoxville, TN and Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, ...;Statistics, Operations, and Management Science Department, University of Tennessee, Knoxville, TN and Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, ...
Venue:
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Year:
2013

Citing 7
Cited 0

CORDS: automatic discovery of correlations and soft functional dependencies

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Exploiting a support-based upper bound of Pearson's correlation coefficient for efficiently identifying strongly correlated pairs

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
TAPER: A Two-Step Approach for All-Strong-Pairs Correlation Query in Large Databases

IEEE Transactions on Knowledge and Data Engineering
TOP-COP: Mining TOP-K Strongly Correlated Pairs in Large Databases

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Volatile correlation computation: a checkpoint view

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Checkpoint evolution for volatile correlation computing

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Efficient correlation computation has been an active research area of data mining. Given a large dataset and a specified query item, we are interested in finding items in the dataset that are within certain range of correlation with the query item. Such a problem, known as the correlation range query (CRQ), has been a common task in many application domains. In this paper, we identify piecewise monotone properties of the upper and lower bounds of the φ coefficient, and propose an efficient correlation range query algorithm, called CORAQ. The CORAQ algorithm effectively prunes many items without computing their actual correlation coefficients with the query item. CORAQ also attains completeness and correctness of the query results. Experiments with large benchmark datasets show that this algorithm is much faster than its brute-force alternative and scales well with large datasets.