Correlation range query

  • Authors:
  • Wenjun Zhou;Hao Zhang

  • Affiliations:
  • Statistics, Operations, and Management Science Department, University of Tennessee, Knoxville, TN and Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, ...;Statistics, Operations, and Management Science Department, University of Tennessee, Knoxville, TN and Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, ...

  • Venue:
  • WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Efficient correlation computation has been an active research area of data mining. Given a large dataset and a specified query item, we are interested in finding items in the dataset that are within certain range of correlation with the query item. Such a problem, known as the correlation range query (CRQ), has been a common task in many application domains. In this paper, we identify piecewise monotone properties of the upper and lower bounds of the φ coefficient, and propose an efficient correlation range query algorithm, called CORAQ. The CORAQ algorithm effectively prunes many items without computing their actual correlation coefficients with the query item. CORAQ also attains completeness and correctness of the query results. Experiments with large benchmark datasets show that this algorithm is much faster than its brute-force alternative and scales well with large datasets.