Evaluating Training Data Suitability for Decision Tree Induction

Authors:
Kati Viikki;Martti Juhola;Ilmar Pyykkö;Pekka Honkavaara
Affiliations:
Department of Computer and Information Sciences, University of Tampere, Tampere, Finland;Department of Computer and Information Sciences, University of Tampere, Tampere, Finland;Department of Otorhinolaryngology, Karolinska Hospital, Stockholm, Sweden;Department of Anaesthesia, Otolaryngological Clinic, Helsinki University Central Hospital, Helsinki, Finland
Venue:
Journal of Medical Systems
Year:
2001

Citing 7
Cited 3

Information-Based Evaluation Criterion for Classifier's Performance

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Machine Learning for the Detection of Oil Spills in Satellite Radar Images

Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
Induction of Decision Trees

Machine Learning
A Mathematical Theory of Communication

A Mathematical Theory of Communication

Evaluation and classification of otoneurological data with new data analysis methods based on machine learning

Information Sciences: an International Journal
Neural network classification of otoneurological data and its visualization

Computers in Biology and Medicine
A scatter method for data and variable importance evaluation

Integrated Computer-Aided Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Dcision tree induction, as well as other inductive learning methods, requires training data of high quality to be able to generate accurate and reliable classification models. Example cases should form a representative sample from the application area, and the attributes used to describe example cases should be relevant and adequate for the classification task to be solved. In this paper, measures of the strength of association and an entropy-based approach have been used to assess the quality of the training data. Studied classification tasks related to three otological data sets: a conscript data set, a vertigo data set, and a postoperative nausea and vomiting data set. The pape suggests that the studied approaches give some guidelines about the quality of the training data, but other approaches are also needed to guide training data building.