Efficient Multidimensional Quantitative Hypotheses Generation

  • Authors:
  • Amihood Amir;Reuven Kashi;Nathan S. Netanyahu

  • Affiliations:
  • -;-;-

  • Venue:
  • ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Finding local interrelations (hypotheses) among attributeswithin very large databases of high dimensionalityis an acute problem for many databases and data miningapplications. These include, dependency modeling, clusteringlarge databases, correlation and link analysis.Traditional statistical methods are concerned with the corroborationof (a set of) hypotheses on a given body ofdata. Testing all of the hypotheses that can be generatedfrom a database with millions of records and dozens offields is clearly infeasible. Generating, on the other hand,a set of the most "promising" hypotheses (to be corroborated)requires much intuition and ingenuity.In this paper we present an efficient method for rankingthe multidimensional hypotheses using image processingof data visualization. In the heart of the method lies theuse of visualization techniques and image processing ideasto rank subsets of attributes according to the relation betweenthem in the databases. Some of the scalability issuesare solved by concise generalized histograms and by usingan efficient on-line computation of clustering around amedian with only five additional memory words. In additionto presenting our algorithmic methodology, we demonstrateits efficiency and performance by applying it to realcensus data sets, as well as synthetic data sets.