A Comparison of Several Approaches to Missing Attribute Values in Data Mining

  • Authors:
  • Jerzy W. Grzymala-Busse;Ming Hu

  • Affiliations:
  • -;-

  • Venue:
  • RSCTC '00 Revised Papers from the Second International Conference on Rough Sets and Current Trends in Computing
  • Year:
  • 2000

Quantified Score

Hi-index 0.01

Visualization

Abstract

In the paper nine different approaches to missing attribute values are presented and compared. Ten input data files were used to investigate the performance of the nine methods to deal with missing attribute values. For testing both naive classification and new classification techniques of LERS (Learning from Examples based on Rough Sets) were used. The quality criterion was the average error rate achieved by ten-fold cross-validation. Using the Wilcoxon matched-pairs signed rank test, we conclude that the C4.5 approach and the method of ignoring examples with missing attribute values are the best methods among all nine approaches; the most common attribute-value method is the worst method among all nine approaches; while some methods do not differ from other methods significantly. The method of assigning to the missing attribute value all possible values of the attribute and the method of assigning to the missing attribute value all possible values of the attribute restricted to the same concept are excellent approaches based on our limited experimental results. However we do not have enough evidence to support the claim that these approaches are superior.