Playing hide-and-seek with correlations

Authors:
Christopher Jermaine
Affiliations:
University of Florida, Gainesville, FL
Venue:
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2003

Citing 9
Cited 4

A new approach to the maximum-flow problem

Journal of the ACM (JACM)
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Approximating clique and biclique problems

Journal of Algorithms
Mining the most interesting rules

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Transversing itemset lattices with statistical metric pruning

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient search for association rules

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding the most interesting correlations in a database: how hard can it be?

Information Systems

Exploiting a support-based upper bound of Pearson's correlation coefficient for efficiently identifying strongly correlated pairs

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
TAPER: A Two-Step Approach for All-Strong-Pairs Correlation Query in Large Databases

IEEE Transactions on Knowledge and Data Engineering
Volatile correlation computation: a checkpoint view

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Dynamic rank correlation computing for financial risk analysis

KSEM'11 Proceedings of the 5th international conference on Knowledge Science, Engineering and Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a method for very high-dimensional correlation analysis. The method relies equally on rigorous search strategies and on human interaction. At each step, the method conservatively "shaves off" a fraction of the database tuples and attributes, so that most of the correlations present in the data are not affected by the decomposition. Instead, the correlations become more obvious to the user, because they are hidden in a much smaller portion of the database. This process can be repeated iteratively and interactively, until only the most important correlations remain.The main technical difficulty of the approach is figuring out how to "shave off" part of the database so as to preserve most correlations. We develop an algorithm for this problem that has a polynomial running time and guarantees result quality.