Data cleaning using belief propagation

  • Authors:
  • Fang Chu;Yizhou Wang;D. Stott Parker;Carlo Zaniolo

  • Affiliations:
  • University of California, Los Angeles;University of California, Los Angeles;University of California, Los Angeles;University of California, Los Angeles

  • Venue:
  • Proceedings of the 2nd international workshop on Information quality in information systems
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Effective data cleaning is critical in many applications where the quality of data is poor due to missing values or inaccurate values. Fortunately, a wide spectrum of applications exhibit strong dependencies between data samples, and such dependencies can be used very effectively for cleaning the data. For example, the readings of nearby sensors are generally correlated, and proteins interact with each other when performing crucial functions. We propose a data cleaning approach, based on modeling data dependencies with Markov networks. Belief propagation is used to efficiently compute the marginal or maximum posterior probabilities, so as to infer missing values or to correct errors. To illustrate the benefits and generality of the technique, we discuss its use in several applications and report on the data quality and improvements so obtained.