A comparison of some rough set approaches to mining symbolic data with missing attribute values

  • Authors:
  • Jerzy W. Grzymala-Busse

  • Affiliations:
  • Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS and Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland

  • Venue:
  • ISMIS'11 Proceedings of the 19th international conference on Foundations of intelligent systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents results of experiments on incomplete data sets obtained by random replacement of attribute values with symbols of missing attribute values. Rule sets were induced from such data using two different types of lower and upper approximation: local and global, and two different interpretations of missing attribute values: lost values and "do not care" conditions. Additionally, we used a probabilistic option, one of the most successful traditional methods to handle missing attribute values. In our experiments we recorded the total error rate, a result of ten-fold cross validation. Using the Wicoxon matched-pairs signed ranks test (5% level of significance for two-tailed test) we observed that for missing attribute values interpreted as "do not care" conditions, the global type of approximations is worse than the local type and that the probabilistic option is worse than the local type.