Measuring the interestingness of discovered knowledge: A principled approach

Authors:
Robert J. Hilderman;Howard J. Hamilton
Affiliations:
Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2. E-mail: {Robert.Hilderman,Howard.Hamilton}@uregina.ca;Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2. E-mail: {Robert.Hilderman,Howard.Hamilton}@uregina.ca
Venue:
Intelligent Data Analysis
Year:
2003

Citing 24
Cited 4

Finding interesting rules from large sets of discovered association rules

CIKM '94 Proceedings of the third international conference on Information and knowledge management
Selecting and reporting what is interesting

Advances in knowledge discovery and data mining
Data Mining in Large Databases Using Domain Generalization Graphs

Journal of Intelligent Information Systems
Visualizing data mining results with domain generalization graphs

Information visualization in data mining and knowledge discovery
Knowledge Discovery and Measures of Interest

Knowledge Discovery and Measures of Interest
Efficient Attribute-Oriented Generalization for Knowledge Discovery from Large Databases

IEEE Transactions on Knowledge and Data Engineering
Data-Driven Discovery of Quantitative Rules in Relational Databases

IEEE Transactions on Knowledge and Data Engineering
Accounting for Domain Knowledge in the Construction of a Generalization Space

ICCS '97 Proceedings of the Fifth International Conference on Conceptual Structures: Fulfilling Peirce's Dream
Parallel Knowledge Discovery Using Domain Generalization Graphs

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Share Based Measures for Itemsets

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Conceptual Knowledge Discovery in Databases Using Formal Concept Analysis Methods

PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
A Metric for Selection of the Most Promising Rules

PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
Peculiarity Oriented Multi-database Mining

PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
Heuristic Measures of Interestingness

PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Machine Learning of Credible Classifications

AI '97 Proceedings of the 10th Australian Joint Conference on Artificial Intelligence: Advanced Topics in Artificial Intelligence
Interestingness of Discovered Association Rules in Terms of Neighborhood-Based Unexpectedness

PAKDD '98 Proceedings of the Second Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining
Identifying Relevant Databases for Multidatabase Mining

PAKDD '98 Proceedings of the Second Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining
CCAIIA: Clustering Categorial Attributed into Interseting Accociation Rules

PAKDD '98 Proceedings of the Second Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining
On Objective Measures of Rule Surprisingness

PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
Mining Market Basket Data Using Share Measures and Characterized Itemsets

PAKDD '98 Proceedings of the Second Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining
Performance evaluation of attribute-oriented algorithms for knowledge discovery from databases

TAI '95 Proceedings of the Seventh International Conference on Tools with Artificial Intelligence
Principles for mining summaries using objective measures of interestingness

ICTAI '00 Proceedings of the 12th IEEE International Conference on Tools with Artificial Intelligence
A Mathematical Theory of Communication

A Mathematical Theory of Communication

A Unified View of Objective Interestingness Measures

MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
DEA-Based Comprehensive Evaluation of Intelligent Knowledge

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Bayesian approaches to ranking sequential patterns interestingness

PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
Optimized rule mining through a unified framework for interestingness measures

DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

When mining a large database, the number of patterns discovered can easily exceed the capabilities of a human user to identify interesting results. To address this problem, various techniques have been suggested to reduce and/or order the patterns prior to presenting them to the user. In this paper, our focus is on ranking summaries generated from a single dataset, where attributes can be generalized in many different ways and to many levels of granularity according to taxonomic hierarchies. We theoretically and empirically evaluate twelve diversity measures used as heuristic measures of interestingness for ranking summaries generated from databases. The twelve diversity measures have previously been utilized in various disciplines, such as information theory, statistics, ecology, and economics. We describe five principles that any measure must satisfy to be considered useful for ranking summaries. Theoretical results show that the proposed principles define a partial order on the ranked summaries in most cases, and in some cases, define a total order. Theoretical results also show that seven of the twelve diversity measures satisfy all of the five principles. We empirically analyze the rank order of the summaries as determined by each of the twelve measures. These empirical results show that the measures tend to rank the less complex summaries as most interesting. Finally, we analyze the distribution of the index values generated by each of the twelve diversity measures. Empirical results, obtained using synthetic data, show that the distribution of index values generated tend to be highly skewed about the mean, median, and middle index values. Finally, we demonstrate a technique, based upon our principles, for visualizing the relative interestingness of summaries. The objective of this work is to gain some insight into the behaviour that can be expected from our principled approach in practice.