The Computational Complexity of High-Dimensional Correlation Search

Authors:
Chris Jermaine
Affiliations:
-
Venue:
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Year:
2001

Citing 0
Cited 6

Exploiting a support-based upper bound of Pearson's correlation coefficient for efficiently identifying strongly correlated pairs

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
TAPER: A Two-Step Approach for All-Strong-Pairs Correlation Query in Large Databases

IEEE Transactions on Knowledge and Data Engineering
Volatile correlation computation: a checkpoint view

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Information discovery across multiple streams

Information Sciences: an International Journal
Efficient set-correlation operator inside databases

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Dynamic rank correlation computing for financial risk analysis

KSEM'11 Proceedings of the 5th international conference on Knowledge Science, Engineering and Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

There is a growing awareness that the popular support metric (often used to guide search in market-basket analysis) is not appropriate for use in every association mining application. Support measures only the frequency of co-occurrence of a set of events when determining which pat-terns to report back to the user. It incorporates no rigorous statistical notion of surprise or interest, and many of the patterns deemed interesting by the support metric are uninteresting to the user.However, a positive aspect of support is that search using support is very efficient. The question we address in this paper is: can we retain this efficiency if we move beyond support, and to other, more rigorous metrics? We consider the computational implications of incorporating simple expectation into the data mining task. It turns out that many variations on the problem which incorporate more rigorous tests of dependence (or independence) result in NP-hard problem definitions.