Volatile correlation computation: a checkpoint view

Authors:
Wenjun Zhou;Hui Xiong
Affiliations:
Rutgers University, Newark, NJ, USA;Rutgers University, Newark, NJ, USA
Venue:
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2008

Citing 9
Cited 5

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Programming pearls (2nd ed.)

Programming pearls (2nd ed.)
Empirical bayes screening for multi-item associations

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
The Computational Complexity of High-Dimensional Correlation Search

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Playing hide-and-seek with correlations

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
CORDS: automatic discovery of correlations and soft functional dependencies

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Exploiting a support-based upper bound of Pearson's correlation coefficient for efficiently identifying strongly correlated pairs

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
TAPER: A Two-Step Approach for All-Strong-Pairs Correlation Query in Large Databases

IEEE Transactions on Knowledge and Data Engineering

Scaling up top-K cosine similarity search

Data & Knowledge Engineering
Dynamic rank correlation computing for financial risk analysis

KSEM'11 Proceedings of the 5th international conference on Knowledge Science, Engineering and Management
CGStream: continuous correlated graph query for data streams

Proceedings of the 21st ACM international conference on Information and knowledge management
Continuous top-k query for graph streams

Proceedings of the 21st ACM international conference on Information and knowledge management
Correlation range query

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent years have witnessed increased interest in computing strongly correlated pairs in very large databases. Most previous studies have been focused on static data sets. However, in real-world applications, input data are often dynamic and must continually be updated. With such large and growing data sets, new research efforts are expected to develop an incremental solution for correlation computing. Along this line, in this paper, we propose a CHECK-POINT algorithm that can efficiently incorporate new transactions for correlation computing as they become available. Specifically, we set a checkpoint to establish a computation buffer, which can help us determine an upper bound for the correlation. This checkpoint bound can be exploited to identify a list of candidate pairs, which will be maintained and computed for correlations as new transactions are added into the database. However, if the total number of new transactions is beyond the buffer size, a new upper bound is computed by the new checkpoint and a new list of candidate pairs is identified. Experimental results on real-world data sets show that CHECK-POINT can significantly reduce the correlation computing cost in dynamic data sets and has the advantage of compacting the use of memory space.