Heuristic for Ranking the Interestigness of Discovered Knowledge

  • Authors:
  • Robert J. Hilderman;Howard J. Hamilton

  • Affiliations:
  • -;-

  • Venue:
  • PAKDD '99 Proceedings of the Third Pacific-Asia Conference on Methodologies for Knowledge Discovery and Data Mining
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe heuristics, based upon information theory and statistics, for ranking the interestingness of summaries generated from databases. The tuples in a summary are unique, and therefore, can be considered to be a population described by some probability distribution. The four interestingness measures presented here are based upon common measures of diversity of a population: variance, the Simpson index, and the Shannon index. Using each of the proposed measures, we assign a single real value to a summary that describes its interestingness. Our experimental results show that the ranks assigned by the four interestingness measures are highly correlated.