A comparison of some rough set approaches to mining symbolic data with missing attribute values

Authors:
Jerzy W. Grzymala-Busse
Affiliations:
Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS and Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland
Venue:
ISMIS'11 Proceedings of the 19th international conference on Foundations of intelligent systems
Year:
2011

Citing 8
Cited 0

Rules in incomplete information systems

Information Sciences: an International Journal
Rough Sets: Theoretical Aspects of Reasoning about Data

Rough Sets: Theoretical Aspects of Reasoning about Data
A Comparison of Several Approaches to Missing Attribute Values in Data Mining

RSCTC '00 Revised Papers from the Second International Conference on Rough Sets and Current Trends in Computing
On the Extension of Rough Sets under Incomplete Information

RSFDGrC '99 Proceedings of the 7th International Workshop on New Directions in Rough Sets, Data Mining, and Granular-Soft Computing
On the Unknown Attribute Values in Learning from Examples

ISMIS '91 Proceedings of the 6th International Symposium on Methodologies for Intelligent Systems
Local and global approximations for incomplete data

Transactions on rough sets VIII
A Local Version of the MLEM2 Algorithm for Rule Induction

Fundamenta Informaticae - Understanding Computers' Intelligence Celebrating the 100th Volume of Fundamenta Informaticae in Honour of Helena Rasiowa
Local and global approximations for incomplete data

RSCTC'06 Proceedings of the 5th international conference on Rough Sets and Current Trends in Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents results of experiments on incomplete data sets obtained by random replacement of attribute values with symbols of missing attribute values. Rule sets were induced from such data using two different types of lower and upper approximation: local and global, and two different interpretations of missing attribute values: lost values and "do not care" conditions. Additionally, we used a probabilistic option, one of the most successful traditional methods to handle missing attribute values. In our experiments we recorded the total error rate, a result of ten-fold cross validation. Using the Wicoxon matched-pairs signed ranks test (5% level of significance for two-tailed test) we observed that for missing attribute values interpreted as "do not care" conditions, the global type of approximations is worse than the local type and that the probabilistic option is worse than the local type.