A Bayesian Approach for Estimating and Replacing Missing Categorical Data

Authors:
Xiao-Bai Li
Affiliations:
University of Massachusetts Lowell
Venue:
Journal of Data and Information Quality (JDIQ)
Year:
2009

Citing 16
Cited 1

Unknown attribute values in induction

Proceedings of the sixth international workshop on Machine learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Applying Bayesian networks to information retrieval

Communications of the ACM
Machine learning, neural and statistical classification

Machine learning, neural and statistical classification
Data preparation for data mining

Data preparation for data mining
Extending the database relational model to capture more meaning

ACM Transactions on Database Systems (TODS)
Data quality assessment

Communications of the ACM - Supporting community and building social capital
Simulation Modeling and Analysis

Simulation Modeling and Analysis
DIRECT: a system for mining data value conversion rules from disparate data sources

Decision Support Systems
The CN2 Induction Algorithm

Machine Learning
Privacy preserving mining of association rules

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Privacy Protection in Data Mining: A Perturbation Approach for Categorical Data

Information Systems Research
Maintaining data privacy in association rule mining

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A Framework for Reconciling Attribute Values from Multiple Data Sources

Management Science

Imputation for categorical attributes with probabilistic reasoning

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a new approach for estimating and replacing missing categorical data. With this approach, the posterior probabilities of a missing attribute value belonging to a certain category are estimated using the simple Bayes method. Two alternative methods for replacing the missing value are proposed: The first replaces the missing value with the value having the estimated maximum probability; the second uses a value that is selected with probability proportional to the estimated posterior distribution. The effectiveness of the proposed approach is evaluated based on some important data quality measures for data warehousing and data mining. The results of the experimental study demonstrate the effectiveness of the proposed approach.