Playing hide-and-seek with correlations

  • Authors:
  • Christopher Jermaine

  • Affiliations:
  • University of Florida, Gainesville, FL

  • Venue:
  • Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a method for very high-dimensional correlation analysis. The method relies equally on rigorous search strategies and on human interaction. At each step, the method conservatively "shaves off" a fraction of the database tuples and attributes, so that most of the correlations present in the data are not affected by the decomposition. Instead, the correlations become more obvious to the user, because they are hidden in a much smaller portion of the database. This process can be repeated iteratively and interactively, until only the most important correlations remain.The main technical difficulty of the approach is figuring out how to "shave off" part of the database so as to preserve most correlations. We develop an algorithm for this problem that has a polynomial running time and guarantees result quality.