Bayesian networks for imputation in classification problems

Authors:
Estevam R. Hruschka, Jr.;Eduardo R. Hruschka;Nelson F. Ebecken
Affiliations:
UFSCar/Federal University of São Carlos, São Carlos, Brazil;Catholic University of Santos (UniSantos), Santos, Brazil;COPPE/Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
Venue:
Journal of Intelligent Information Systems
Year:
2007

Citing 22
Cited 5

Statistical analysis with missing data

Statistical analysis with missing data
Probabilistic induction by dynamic part generation in virtual trees

Proceedings of Expert Systems '86, The 6Th Annual Technical Conference on Research and development in expert systems III
Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
Unknown attribute values in induction

Proceedings of the sixth international workshop on Machine learning
A Bayesian Method for the Induction of Probabilistic Networks from Data

Machine Learning
Hierarchical mixtures of experts and the EM algorithm

Neural Computation
Convergence results for the EM approach to mixtures of experts architectures

Neural Networks
Data preparation for data mining

Data preparation for data mining
SMILE: Structural Modeling, Inference, and Learning Engine and GeNIe: a development environment for graphical decision-theoretic models

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Using Bayesian networks to analyze expression data

RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
Data mining: concepts and techniques

Data mining: concepts and techniques
Machine Learning

Machine Learning
Induction of Decision Trees

Machine Learning
Learning Belief Networks in the Presence of Missing Values and Hidden Variables

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Learning from Incomplete Data

Learning from Incomplete Data
Using unlabeled data to improve text classification

Using unlabeled data to improve text classification
Genetic wrappers for feature selection in decision tree induction and variable ordering in Bayesian network structure learning

Information Sciences: an International Journal - Special issue: Soft computing data mining
Missing values prediction with K2

Intelligent Data Analysis
On convergence properties of the em algorithm for gaussian mixtures

Neural Computation
Lazy decision trees

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Comparing Bayesian network classifiers

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

On the influence of imputation in classification: practical issues

Journal of Experimental & Theoretical Artificial Intelligence
A robust missing value imputation method for noisy data

Applied Intelligence
A classifier ensemble approach for the missing feature problem

Artificial Intelligence in Medicine
Imputation for categorical attributes with probabilistic reasoning

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Full Length Article: Nearest neighbor imputation using spatial-temporal correlations in wireless sensor networks

Information Fusion

Quantified Score

Hi-index	0.00

Visualization

Abstract

Missing values are an important problem in data mining. In order to tackle this problem in classification tasks, we propose two imputation methods based on Bayesian networks. These methods are evaluated in the context of both prediction and classification tasks. We compare the obtained results with those achieved by classical imputation methods (Expectation---Maximization, Data Augmentation, Decision Trees, and Mean/Mode). Our simulations were performed by means of four datasets (Congressional Voting Records, Mushroom, Wisconsin Breast Cancer and Adult), which are benchmarks for data mining methods. Missing values were simulated in these datasets by means of the elimination of some known values. Thus, it is possible to assess the prediction capability of an imputation method, comparing the original values with the imputed ones. In addition, we propose a methodology to estimate the bias inserted by imputation methods in classification tasks. In this sense, we use four classifiers (One Rule, Naïve Bayes, J4.8 Decision Tree and PART) to evaluate the employed imputation methods in classification scenarios. Computing times consumed to perform imputations are also reported. Simulation results in terms of prediction, classification, and computing times allow us performing several analyses, leading to interesting conclusions. Bayesian networks have shown to be competitive with classical imputation methods.