Data cleaning using belief propagation

Authors:
Fang Chu;Yizhou Wang;D. Stott Parker;Carlo Zaniolo
Affiliations:
University of California, Los Angeles;University of California, Los Angeles;University of California, Los Angeles;University of California, Los Angeles
Venue:
Proceedings of the 2nd international workshop on Information quality in information systems
Year:
2005

Citing 6
Cited 4

Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Exploiting unlabeled data in ensemble methods

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Dealing with predictive-but-unpredictable attributes in noisy data sources

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Model-driven data acquisition in sensor networks

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Loopy belief propagation for approximate inference: an empirical study

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Report from the First and Second International Workshops on Information Quality in Information Systems: IQIS 2004 and IQIS 2005 in conjunction with ACM SIGMOD/PODS Conferences

ACM SIGMOD Record
Fuzzy modeling for data cleaning in sensor networks

International Journal of Hybrid Intelligent Systems - Recent Advances in Intelligent Paradigms Fusion and Their Applications
ERACER: a database approach for statistical inference and data cleaning

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
In-network approximate computation of outliers with quality guarantees

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Effective data cleaning is critical in many applications where the quality of data is poor due to missing values or inaccurate values. Fortunately, a wide spectrum of applications exhibit strong dependencies between data samples, and such dependencies can be used very effectively for cleaning the data. For example, the readings of nearby sensors are generally correlated, and proteins interact with each other when performing crucial functions. We propose a data cleaning approach, based on modeling data dependencies with Markov networks. Belief propagation is used to efficiently compute the marginal or maximum posterior probabilities, so as to infer missing values or to correct errors. To illustrate the benefits and generality of the technique, we discuss its use in several applications and report on the data quality and improvements so obtained.