Improved Algorithms for Univariate Discretization of Continuous Features

  • Authors:
  • Jussi Kujala;Tapio Elomaa

  • Affiliations:
  • Institute of Software Systems, Tampere University of Technology, P.O. Box 553, FI-33101 Tampere, Finland;Institute of Software Systems, Tampere University of Technology, P.O. Box 553, FI-33101 Tampere, Finland

  • Venue:
  • PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In discretization of a continuous variable its numerical value range is divided into a few intervals that are used in classification. For example, Naïve Bayes can benefit from this processing. A commonly-used supervised discretization method is Fayyad and Irani's recursive entropy-based splitting of a value range. The technique uses ent-mdlas a model selection criterion to decide whether to accept the proposed split.We argue that theoretically the method is not always close to ideal for this application. Empirical experiments support our finding. We give a statistical rule that does not use the ad-hoc rule of Fayyad and Irani's approach to increase its performance. This rule, though, is quite time consuming to compute. We also demonstrate that a very simple Bayesian method performs better than ent-mdlas a model selection criterion.