Heuristic for Ranking the Interestigness of Discovered Knowledge

Authors:
Robert J. Hilderman;Howard J. Hamilton
Affiliations:
-;-
Venue:
PAKDD '99 Proceedings of the Third Pacific-Asia Conference on Methodologies for Knowledge Discovery and Data Mining
Year:
1999

Citing 9
Cited 2

From data mining to knowledge discovery: an overview

Advances in knowledge discovery and data mining
Efficient Attribute-Oriented Generalization for Knowledge Discovery from Large Databases

IEEE Transactions on Knowledge and Data Engineering
Parallel Knowledge Discovery Using Domain Generalization Graphs

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Share Based Measures for Itemsets

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Generalization Lattices

PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
Identifying Relevant Databases for Multidatabase Mining

PAKDD '98 Proceedings of the Second Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining
Ranking the Interestingness of Summaries from Data Mining Systems

Proceedings of the Twelfth International Florida Artificial Intelligence Research Society Conference
Mining Market Basket Data Using Share Measures and Characterized Itemsets

PAKDD '98 Proceedings of the Second Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining
A Mathematical Theory of Communication

A Mathematical Theory of Communication

Evaluation of Interestingness Measures for Ranking Discovered Knowledge

PAKDD '01 Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining
Biological development of cell patterns: characterizing the space of cell chemistry genetic regulatory networks

ECAL'05 Proceedings of the 8th European conference on Advances in Artificial Life

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe heuristics, based upon information theory and statistics, for ranking the interestingness of summaries generated from databases. The tuples in a summary are unique, and therefore, can be considered to be a population described by some probability distribution. The four interestingness measures presented here are based upon common measures of diversity of a population: variance, the Simpson index, and the Shannon index. Using each of the proposed measures, we assign a single real value to a summary that describes its interestingness. Our experimental results show that the ranks assigned by the four interestingness measures are highly correlated.