Mining the most interesting rules
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Interestingness via what is not interesting
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining in Large Databases Using Domain Generalization Graphs
Journal of Intelligent Information Systems
Efficient Attribute-Oriented Generalization for Knowledge Discovery from Large Databases
IEEE Transactions on Knowledge and Data Engineering
Accounting for Domain Knowledge in the Construction of a Generalization Space
ICCS '97 Proceedings of the Fifth International Conference on Conceptual Structures: Fulfilling Peirce's Dream
Conceptual Knowledge Discovery in Databases Using Formal Concept Analysis Methods
PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
Heuristic Measures of Interestingness
PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
Applying Objective Interestingness Measures in Data Mining Systems
PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Interestingness of Discovered Association Rules in Terms of Neighborhood-Based Unexpectedness
PAKDD '98 Proceedings of the Second Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining
Heuristic for Ranking the Interestigness of Discovered Knowledge
PAKDD '99 Proceedings of the Third Pacific-Asia Conference on Methodologies for Knowledge Discovery and Data Mining
On Objective Measures of Rule Surprisingness
PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
Principles for mining summaries using objective measures of interestingness
ICTAI '00 Proceedings of the 12th IEEE International Conference on Tools with Artificial Intelligence
A Mathematical Theory of Communication
A Mathematical Theory of Communication
An Axiomatic Approach to Defining Approximation Measures for Functional Dependencies
ADBIS '02 Proceedings of the 6th East European Conference on Advances in Databases and Information Systems
Mining frequent arrangements of temporal intervals
Knowledge and Information Systems
Knowledge extraction using a conceptual information system (ExCIS)
ODBIS'05/06 Proceedings of the First and Second VLDB conference on Ontologies-based databases and information systems
Using importance flooding to identify interesting networks of criminal activity
ISI'06 Proceedings of the 4th IEEE international conference on Intelligence and Security Informatics
Hi-index | 0.00 |
When mining a large database, the number of patterns discovered can easily exceed the capabilities of a human user to identify interesting results. To address this problem, various techniques have been suggested to reduce and/or order the patterns prior to presenting them to the user. In this paper, our focus is on ranking summaries generated from a single dataset, where attributes can be generalized in many different ways and to many levels of granularity according to taxonomic hierarchies. We theoretically and empirically evaluate thirteen diversity measures used as heuristic measures of interestingness for ranking summaries generated from databases. The thirteen diversity measures have previously been utilized in various disciplines, such as information theory, statistics, ecology, and economics. We describe five principles that any measure must satisfy to be considered useful for ranking summaries. Theoretical results show that only four of the thirteen diversity measures satisfy all of the principles. We then analyze the distribution of the index values generated by each of the thirteen diversity measures. Empirical results, obtained using synthetic data, show that the distribution of index values generated tend to be highly skewed about the mean, median, and middle index values. The objective of this work is to gain some insight into the behaviour that can be expected from each of the measures in practice.