Evaluating Training Data Suitability for Decision Tree Induction

  • Authors:
  • Kati Viikki;Martti Juhola;Ilmar Pyykkö;Pekka Honkavaara

  • Affiliations:
  • Department of Computer and Information Sciences, University of Tampere, Tampere, Finland;Department of Computer and Information Sciences, University of Tampere, Tampere, Finland;Department of Otorhinolaryngology, Karolinska Hospital, Stockholm, Sweden;Department of Anaesthesia, Otolaryngological Clinic, Helsinki University Central Hospital, Helsinki, Finland

  • Venue:
  • Journal of Medical Systems
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Dcision tree induction, as well as other inductive learning methods, requires training data of high quality to be able to generate accurate and reliable classification models. Example cases should form a representative sample from the application area, and the attributes used to describe example cases should be relevant and adequate for the classification task to be solved. In this paper, measures of the strength of association and an entropy-based approach have been used to assess the quality of the training data. Studied classification tasks related to three otological data sets: a conscript data set, a vertigo data set, and a postoperative nausea and vomiting data set. The pape suggests that the studied approaches give some guidelines about the quality of the training data, but other approaches are also needed to guide training data building.