A Bayesian Approach for Estimating and Replacing Missing Categorical Data

  • Authors:
  • Xiao-Bai Li

  • Affiliations:
  • University of Massachusetts Lowell

  • Venue:
  • Journal of Data and Information Quality (JDIQ)
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a new approach for estimating and replacing missing categorical data. With this approach, the posterior probabilities of a missing attribute value belonging to a certain category are estimated using the simple Bayes method. Two alternative methods for replacing the missing value are proposed: The first replaces the missing value with the value having the estimated maximum probability; the second uses a value that is selected with probability proportional to the estimated posterior distribution. The effectiveness of the proposed approach is evaluated based on some important data quality measures for data warehousing and data mining. The results of the experimental study demonstrate the effectiveness of the proposed approach.