Statistical analysis with missing data
Statistical analysis with missing data
Probabilistic induction by dynamic part generation in virtual trees
Proceedings of Expert Systems '86, The 6Th Annual Technical Conference on Research and development in expert systems III
Probabilistic reasoning in intelligent systems: networks of plausible inference
Probabilistic reasoning in intelligent systems: networks of plausible inference
Unknown attribute values in induction
Proceedings of the sixth international workshop on Machine learning
C4.5: programs for machine learning
C4.5: programs for machine learning
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
Machine Learning - Special issue on learning with probabilistic representations
Data preparation for data mining
Data preparation for data mining
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Mining massively incomplete data sets by conceptual reconstruction
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Learning missing values from summary constraints
ACM SIGKDD Explorations Newsletter
Machine Learning
Learning from Incomplete Data
Nearest neighbour approach in the least-squares data imputation algorithms
Information Sciences: an International Journal
Economical active feature-value acquisition through Expected Utility estimation
UBDM '05 Proceedings of the 1st international workshop on Utility-based data mining
Incorporating an EM-Approach for Handling Missing Attribute-Values in Decision Tree Induction
HIS '05 Proceedings of the Fifth International Conference on Hybrid Intelligent Systems
Naive Bayes as an Imputation Tool for Classification Problems
HIS '05 Proceedings of the Fifth International Conference on Hybrid Intelligent Systems
A new imputation method for small software project data sets
Journal of Systems and Software
Missing values prediction with K2
Intelligent Data Analysis
Bayesian networks for imputation in classification problems
Journal of Intelligent Information Systems
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Estimating continuous distributions in Bayesian classifiers
UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Towards efficient imputation by nearest-neighbors: a clustering-based approach
AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence
An Evolutionary Algorithm for Missing Values Substitution in Classification Tasks
HAIS '09 Proceedings of the 4th International Conference on Hybrid Artificial Intelligence Systems
Data & Knowledge Engineering
Hi-index | 0.00 |
The substitution of missing values, also called imputation, is an important data preparation task for many domains. Ideally, the substitution of missing values should not insert biases into the dataset. This aspect has been usually assessed by some measures of the prediction capability of imputation methods. Such measures assume the simulation of missing entries for some attributes whose values are actually known. These artificially missing values are imputed and then compared with the original values. Although this evaluation is useful, it does not allow the influence of imputed values in the ultimate modelling task (e.g. in classification) to be inferred. We argue that imputation cannot be properly evaluated apart from the modelling task. Thus, alternative approaches are needed. This article elaborates on the influence of imputed values in classification. In particular, a practical procedure for estimating the inserted bias is described. As an additional contribution, we have used such a procedure to empirically illustrate the performance of three imputation methods (majority, naive Bayes and Bayesian networks) in three datasets. Three classifiers (decision tree, naive Bayes and nearest neighbours) have been used as modelling tools in our experiments. The achieved results illustrate a variety of situations that can take place in the data preparation practice.