Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Beyond market baskets: generalizing association rules to correlations
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Programming pearls (2nd ed.)
Empirical bayes screening for multi-item associations
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
The Computational Complexity of High-Dimensional Correlation Search
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Playing hide-and-seek with correlations
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
CORDS: automatic discovery of correlations and soft functional dependencies
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
TAPER: A Two-Step Approach for All-Strong-Pairs Correlation Query in Large Databases
IEEE Transactions on Knowledge and Data Engineering
Scaling up top-K cosine similarity search
Data & Knowledge Engineering
Dynamic rank correlation computing for financial risk analysis
KSEM'11 Proceedings of the 5th international conference on Knowledge Science, Engineering and Management
CGStream: continuous correlated graph query for data streams
Proceedings of the 21st ACM international conference on Information and knowledge management
Continuous top-k query for graph streams
Proceedings of the 21st ACM international conference on Information and knowledge management
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Hi-index | 0.00 |
Recent years have witnessed increased interest in computing strongly correlated pairs in very large databases. Most previous studies have been focused on static data sets. However, in real-world applications, input data are often dynamic and must continually be updated. With such large and growing data sets, new research efforts are expected to develop an incremental solution for correlation computing. Along this line, in this paper, we propose a CHECK-POINT algorithm that can efficiently incorporate new transactions for correlation computing as they become available. Specifically, we set a checkpoint to establish a computation buffer, which can help us determine an upper bound for the correlation. This checkpoint bound can be exploited to identify a list of candidate pairs, which will be maintained and computed for correlations as new transactions are added into the database. However, if the total number of new transactions is beyond the buffer size, a new upper bound is computed by the new checkpoint and a new list of candidate pairs is identified. Experimental results on real-world data sets show that CHECK-POINT can significantly reduce the correlation computing cost in dynamic data sets and has the advantage of compacting the use of memory space.