Interestingness PreProcessing

  • Authors:
  • Sigal Sahar

  • Affiliations:
  • -

  • Venue:
  • ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

As the size of databases increases, the number of rules mined from them also increases, often to a extent that overwhelms users. To address this problem, an important part of the KDD process is dedicated to determining which of these patterns is interesting. In this paper we define the Interestingness PreProcessing Step, and introduce a new framework for interestingness analysis. In asimilar fashion to data-preprocessing, this preprocessing should always be applied prior to interestingness processing. A strictrequirement, and the biggest challenge, in defining Interestingness PreProcessing techniques is that the preprocessing will not eliminate any potentially interesting patterns. That is, the preprocessing methods must be domain-,task-and user-independent. This property differentiates the preprocessing methods from existing interestingness criteria, and, since they can be applied automatically, makes them very useful. This generic nature also makes them rare: PreProcessing methods are very challenging to define.We also define in this paper the first two preprocessing techniques, and present the empirical results of applying them to six databases. The results indicate that Interestingness PreProcessing Step is very powerful: in most cases, an average of half the rules mined were eliminated by the application of the two Interestingness PreProcessing techniques. These results are Particularly significant since no user-interaction is required to achieve them.