A Comparison of Several Approaches to Missing Attribute Values in Data Mining

Authors:
Jerzy W. Grzymala-Busse;Ming Hu
Affiliations:
-;-
Venue:
RSCTC '00 Revised Papers from the Second International Conference on Rough Sets and Current Trends in Computing
Year:
2000

Citing 7
Cited 46

Induction: processes of inference, learning, and discovery

Induction: processes of inference, learning, and discovery
Synthesizing knowledge: A cluster analysis approach using event covering

IEEE Transactions on Systems, Man and Cybernetics
Synthesizing Statistical Knowledge from Incomplete Mixed-Mode Data

IEEE Transactions on Pattern Analysis and Machine Intelligence
Classifier systems and genetic algorithms

Machine learning: paradigms and methods
C4.5: programs for machine learning

C4.5: programs for machine learning
The CN2 Induction Algorithm

Machine Learning
On the Unknown Attribute Values in Learning from Examples

ISMIS '91 Proceedings of the 6th International Symposium on Methodologies for Intelligent Systems

Modeling and Imputation of Large Incomplete Multidimensional Datasets

DaWaK 2000 Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery
Incomplete Data Decomposition for Classification

TSCTC '02 Proceedings of the Third International Conference on Rough Sets and Current Trends in Computing
Granulation of Knowledge in Decision Systems: The Approach Based on Rough Inclusions. The Method and Its Applications

RSEISP '07 Proceedings of the international conference on Rough Sets and Intelligent Systems Paradigms
On Granular Rough Computing with Missing Values

RSEISP '07 Proceedings of the international conference on Rough Sets and Intelligent Systems Paradigms
A Study in Granular Computing: On Classifiers Induced from Granular Reflections of Data

Transactions on Rough Sets IX
Missing Values: Proposition of a Typology and Characterization with an Association Rule-Based Model

DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
A Local Version of the MLEM2 Algorithm for Rule Induction

Fundamenta Informaticae - Understanding Computers' Intelligence Celebrating the 100th Volume of Fundamenta Informaticae in Honour of Helena Rasiowa
A review and comparison of strategies for handling missing values in separate-and-conquer rule learning

Journal of Intelligent Information Systems
Indiscernibility and similarity in an incomplete information table

RSKT'10 Proceedings of the 5th international conference on Rough set and knowledge technology
Positive approximation and converse approximation in interval-valued fuzzy rough sets

Information Sciences: an International Journal
Rule extraction based on granulation order in interval-valued fuzzy information system

Expert Systems with Applications: An International Journal
Diagnosis of cardiac arrhythmia using fuzzy immune approach

ICANNGA'11 Proceedings of the 10th international conference on Adaptive and natural computing algorithms - Volume Part II
A comparison of some rough set approaches to mining symbolic data with missing attribute values

ISMIS'11 Proceedings of the 19th international conference on Foundations of intelligent systems
Mining incomplete data: a rough set approach

RSKT'11 Proceedings of the 6th international conference on Rough sets and knowledge technology
An interval set model for learning rules from incomplete information table

International Journal of Approximate Reasoning
A rough set approach to data with missing attribute values

RSKT'06 Proceedings of the First international conference on Rough Sets and Knowledge Technology
Handling incomplete categorical data for supervised learning

IEA/AIE'06 Proceedings of the 19th international conference on Advances in Applied Artificial Intelligence: industrial, Engineering and Other Applications of Applied Intelligent Systems
Ensembles of decision rules for solving binary classification problems in the presence of missing values

RSCTC'06 Proceedings of the 5th international conference on Rough Sets and Current Trends in Computing
Missing template decomposition method and its implementation in rough set exploration system

RSCTC'06 Proceedings of the 5th international conference on Rough Sets and Current Trends in Computing
Developing a decision model for asthma exacerbations: combining rough sets and expert-driven selection of clinical attributes

RSCTC'06 Proceedings of the 5th international conference on Rough Sets and Current Trends in Computing
CHASE2: rule based chase algorithm for information systems of type λ

AM'03 Proceedings of the Second international conference on Active Mining
A SVM regression based approach to filling in missing values

KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part III
Generalized approximations defined by non-equivalence relations

Information Sciences: an International Journal
The rough set exploration system

Transactions on Rough Sets III
Characteristic relations for incomplete data: a generalization of the indiscernibility relation

Transactions on Rough Sets IV
Motion-information-based video retrieval system using rough pre-classification

Transactions on Rough Sets V
Reduced attribute oriented inconsistency handling in decision generation

IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
Feature evaluation and selection with cooperative game theory

Pattern Recognition
LAD-CBM; new data processing tool for diagnosis and prognosis in condition-based maintenance

Journal of Intelligent Manufacturing
An analysis on the use of pre-processing methods in evolutionary fuzzy systems for subgroup discovery

Expert Systems with Applications: An International Journal
A hybrid particle swarm optimization based fuzzy expert system for the diagnosis of coronary artery disease

Expert Systems with Applications: An International Journal
Inductive learning models with missing values

Mathematical and Computer Modelling: An International Journal
Using cooperative game theory to optimize the feature selection problem

Neurocomputing
A Two-Phase Model for Learning Rules from Incomplete Data

Fundamenta Informaticae - Fundamentals of Knowledge Technology
Data-Driven valued tolerance relation

RSKT'12 Proceedings of the 7th international conference on Rough Sets and Knowledge Technology
Flexible Indiscernibility Relations for Missing Attribute Values

Fundamenta Informaticae - Concurrency Specification and Programming (CS&P 2004)
On Decomposition for Incomplete Data

Fundamenta Informaticae
Feature selection using dynamic weights for classification

Knowledge-Based Systems
Study on data preprocessing for daylight climate data

ICICA'12 Proceedings of the Third international conference on Information Computing and Applications
Knowledge augmentation via incremental clustering: new technology for effective knowledge management

International Journal of Business Information Systems
WebPut: efficient web-based data imputation

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Heterogeneous features and model selection for event-based media classification

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Locally linear reconstruction based missing value imputation for supervised learning

Neurocomputing
Imprecise imputation as a tool for solving classification problems with mean values of unobserved features

Advances in Artificial Intelligence
A fast feature selection approach based on rough set boundary regions

Pattern Recognition Letters
Extended tolerance relation to define a new rough set model in incomplete information systems

Advances in Fuzzy Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

In the paper nine different approaches to missing attribute values are presented and compared. Ten input data files were used to investigate the performance of the nine methods to deal with missing attribute values. For testing both naive classification and new classification techniques of LERS (Learning from Examples based on Rough Sets) were used. The quality criterion was the average error rate achieved by ten-fold cross-validation. Using the Wilcoxon matched-pairs signed rank test, we conclude that the C4.5 approach and the method of ignoring examples with missing attribute values are the best methods among all nine approaches; the most common attribute-value method is the worst method among all nine approaches; while some methods do not differ from other methods significantly. The method of assigning to the missing attribute value all possible values of the attribute and the method of assigning to the missing attribute value all possible values of the attribute restricted to the same concept are excellent approaches based on our limited experimental results. However we do not have enough evidence to support the claim that these approaches are superior.