Technical Note: Bias in Information-Based Measures in Decision Tree Induction

  • Authors:
  • Allan P. White;Wei Zhong Liu

  • Affiliations:
  • Computer Centre, University of Birmingham, P.O. Box 363, Birmingham B15 2TT, United Kingdom. A.P.WHITE@BHAM.AC.UK;School of Mathematics and Statistics, University of Birmingham, P.O. Box 363, Birmingham B15 2TT, United Kingdom. W.Z.LIU@BHAM.AC.UK

  • Venue:
  • Machine Learning
  • Year:
  • 1994

Quantified Score

Hi-index 0.00

Visualization

Abstract

A fresh look is taken at the problem of bias in information-based attribute selection measures, used in the induction of decision trees. The approach uses statistical simulation techniques to demonstrate that the usual measures such as information gain, gain ratio, and a new measure recently proposed by Lopez de Mantaras (1991) are all biased in favour of attributes with large numbers of values. It is concluded that approaches which utilise the chi-square distribution are preferable because they compensate automatically for differences between attributes in the number of levels they take.