Imputation for categorical attributes with probabilistic reasoning

Authors:
Lian Jin;Hongzhi Wang;Hong Gao
Affiliations:
Department of Computer Science and Technology, Harbin Institute of Technology, China;Department of Computer Science and Technology, Harbin Institute of Technology, China;Department of Computer Science and Technology, Harbin Institute of Technology, China
Venue:
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Year:
2013

Citing 10
Cited 0

Statistical analysis with missing data

Statistical analysis with missing data
Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
A Bayesian Method for the Induction of Probabilistic Networks from Data

Machine Learning
"Missing Is Useful': Missing Values in Cost-Sensitive Decision Trees

IEEE Transactions on Knowledge and Data Engineering
Bayesian networks for imputation in classification problems

Journal of Intelligent Information Systems
Missing Attribute Value Prediction Based on Artificial Neural Network and Rough Set Theory

BMEI '08 Proceedings of the 2008 International Conference on BioMedical Engineering and Informatics - Volume 01
A Bayesian Approach for Estimating and Replacing Missing Categorical Data

Journal of Data and Information Quality (JDIQ)
Estimation of Missing Values Using a Weighted K-Nearest Neighbors Algorithm

ESIAT '09 Proceedings of the 2009 International Conference on Environmental Science and Information Application Technology - Volume 03
Shell-neighbor method and its application in missing data imputation

Applied Intelligence
Missing values estimation in microarray data with partial least squares regression

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Since incompleteness affects the data usage, missing values in database should be estimated to make data mining and analysis more accurate. In addition to ignoring or setting to default values, many imputation methods have been proposed, but all of them have their limitations. This paper proposes a probabilistic method to estimate missing values. We construct a Bayesian network in a novel way to identify the dependencies in a dataset, then use the Bayesian reasoning process to find the most probable substitution for each missing value. The benefits of this method include (1) irrelevant attributes can be ignored during estimation; (2) network is built with no target attribute, which means all attributes are handled in one model;(3) probability information can be obtained to measure the accuracy of the imputation. Experimental results show that our construction algorithm is effective and the quality of filled values outperforms the mode imputation method and kNN method. We also verify the effectiveness of the probabilities given by our method experimentally.