Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
Data preparation for data mining
Data preparation for data mining
Naive Bayes as an Imputation Tool for Classification Problems
HIS '05 Proceedings of the Fifth International Conference on Hybrid Intelligent Systems
Statistical Comparisons of Classifiers over Multiple Data Sets
The Journal of Machine Learning Research
Handling Missing Values when Applying Classification Models
The Journal of Machine Learning Research
Guest editorial: Recent advances in preserving privacy when mining data
Data & Knowledge Engineering
Privacy-preserving imputation of missing data
Data & Knowledge Engineering
Impact of imputation of missing values on classification error for discrete data
Pattern Recognition
On the influence of imputation in classification: practical issues
Journal of Experimental & Theoretical Artificial Intelligence
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
EACImpute: An Evolutionary Algorithm for Clustering-Based Imputation
ISDA '09 Proceedings of the 2009 Ninth International Conference on Intelligent Systems Design and Applications
The Effects and Interactions of Data Quality and Problem Complexity on Classification
Journal of Data and Information Quality (JDIQ)
Towards efficient imputation by nearest-neighbors: a clustering-based approach
AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence
The efficient imputation method for neighborhood-based collaborative filtering
Proceedings of the 21st ACM international conference on Information and knowledge management
Hi-index | 0.00 |
The substitution of missing values, also called imputation, is an important data preparation task for data mining applications. Imputation algorithms have been traditionally compared in terms of the similarity between imputed and original values. However, this traditional approach, sometimes referred to as prediction ability, does not allow inferring the influence of imputed values in the ultimate modeling tasks (e.g., in classification). Based on an extensive experimental work, we study the influence of five nearest-neighbor based imputation algorithms (KNNImpute, SKNN, IKNNImpute, KMI and EACImpute) and two simple algorithms widely used in practice (Mean Imputation and Majority Method) on classification problems. In order to experimentally assess these algorithms, simulations of missing values were performed on six datasets by means of two missingness mechanisms: Missing Completely at Random (MCAR) and Missing at Random (MAR). The latter allows the probabilities of missingness to depend on observed data but not on missing data, whereas the former occurs when the distribution of missingness does not depend on the observed data either. The quality of the imputed values is assessed by two measures: prediction ability and classification bias. Experimental results show that IKNNImpute outperforms the other algorithms in the MCAR mechanism. KNNImpute, SKNN and EACImpute, by their turn, provided the best results in the MAR mechanism. Finally, our experiments also show that best prediction results (in terms of mean squared errors) do not necessarily yield to less classification bias.