Evaluation of Interestingness Measures for Ranking Discovered Knowledge

Authors:
Robert J. Hilderman;Howard J. Hamilton
Affiliations:
-;-
Venue:
PAKDD '01 Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining
Year:
2001

Citing 14
Cited 4

Mining the most interesting rules

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Interestingness via what is not interesting

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining in Large Databases Using Domain Generalization Graphs

Journal of Intelligent Information Systems
Efficient Attribute-Oriented Generalization for Knowledge Discovery from Large Databases

IEEE Transactions on Knowledge and Data Engineering
Accounting for Domain Knowledge in the Construction of a Generalization Space

ICCS '97 Proceedings of the Fifth International Conference on Conceptual Structures: Fulfilling Peirce's Dream
Conceptual Knowledge Discovery in Databases Using Formal Concept Analysis Methods

PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
Heuristic Measures of Interestingness

PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
Applying Objective Interestingness Measures in Data Mining Systems

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Interestingness of Discovered Association Rules in Terms of Neighborhood-Based Unexpectedness

PAKDD '98 Proceedings of the Second Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining
Heuristic for Ranking the Interestigness of Discovered Knowledge

PAKDD '99 Proceedings of the Third Pacific-Asia Conference on Methodologies for Knowledge Discovery and Data Mining
On Objective Measures of Rule Surprisingness

PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
Principles for mining summaries using objective measures of interestingness

ICTAI '00 Proceedings of the 12th IEEE International Conference on Tools with Artificial Intelligence
A Mathematical Theory of Communication

A Mathematical Theory of Communication

An Axiomatic Approach to Defining Approximation Measures for Functional Dependencies

ADBIS '02 Proceedings of the 6th East European Conference on Advances in Databases and Information Systems
Mining frequent arrangements of temporal intervals

Knowledge and Information Systems
Knowledge extraction using a conceptual information system (ExCIS)

ODBIS'05/06 Proceedings of the First and Second VLDB conference on Ontologies-based databases and information systems
Using importance flooding to identify interesting networks of criminal activity

ISI'06 Proceedings of the 4th IEEE international conference on Intelligence and Security Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

When mining a large database, the number of patterns discovered can easily exceed the capabilities of a human user to identify interesting results. To address this problem, various techniques have been suggested to reduce and/or order the patterns prior to presenting them to the user. In this paper, our focus is on ranking summaries generated from a single dataset, where attributes can be generalized in many different ways and to many levels of granularity according to taxonomic hierarchies. We theoretically and empirically evaluate thirteen diversity measures used as heuristic measures of interestingness for ranking summaries generated from databases. The thirteen diversity measures have previously been utilized in various disciplines, such as information theory, statistics, ecology, and economics. We describe five principles that any measure must satisfy to be considered useful for ranking summaries. Theoretical results show that only four of the thirteen diversity measures satisfy all of the principles. We then analyze the distribution of the index values generated by each of the thirteen diversity measures. Empirical results, obtained using synthetic data, show that the distribution of index values generated tend to be highly skewed about the mean, median, and middle index values. The objective of this work is to gain some insight into the behaviour that can be expected from each of the measures in practice.