Interestingness PreProcessing

Authors:
Sigal Sahar
Affiliations:
-
Venue:
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Year:
2001

Citing 0
Cited 3

On Characterization and Discovery of Minimal Unexpected Patterns in Rule Discovery

IEEE Transactions on Knowledge and Data Engineering
Using importance flooding to identify interesting networks of criminal activity

ISI'06 Proceedings of the 4th IEEE international conference on Intelligence and Security Informatics
PARAS: a parameter space framework for online association mining

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

As the size of databases increases, the number of rules mined from them also increases, often to a extent that overwhelms users. To address this problem, an important part of the KDD process is dedicated to determining which of these patterns is interesting. In this paper we define the Interestingness PreProcessing Step, and introduce a new framework for interestingness analysis. In asimilar fashion to data-preprocessing, this preprocessing should always be applied prior to interestingness processing. A strictrequirement, and the biggest challenge, in defining Interestingness PreProcessing techniques is that the preprocessing will not eliminate any potentially interesting patterns. That is, the preprocessing methods must be domain-,task-and user-independent. This property differentiates the preprocessing methods from existing interestingness criteria, and, since they can be applied automatically, makes them very useful. This generic nature also makes them rare: PreProcessing methods are very challenging to define.We also define in this paper the first two preprocessing techniques, and present the empirical results of applying them to six databases. The results indicate that Interestingness PreProcessing Step is very powerful: in most cases, an average of half the rules mined were eliminated by the application of the two Interestingness PreProcessing techniques. These results are Particularly significant since no user-interaction is required to achieve them.