Probabilistic reasoning in intelligent systems: networks of plausible inference
Probabilistic reasoning in intelligent systems: networks of plausible inference
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Exploiting unlabeled data in ensemble methods
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Dealing with predictive-but-unpredictable attributes in noisy data sources
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Model-driven data acquisition in sensor networks
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Loopy belief propagation for approximate inference: an empirical study
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Fuzzy modeling for data cleaning in sensor networks
International Journal of Hybrid Intelligent Systems - Recent Advances in Intelligent Paradigms Fusion and Their Applications
ERACER: a database approach for statistical inference and data cleaning
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
In-network approximate computation of outliers with quality guarantees
Information Systems
Hi-index | 0.00 |
Effective data cleaning is critical in many applications where the quality of data is poor due to missing values or inaccurate values. Fortunately, a wide spectrum of applications exhibit strong dependencies between data samples, and such dependencies can be used very effectively for cleaning the data. For example, the readings of nearby sensors are generally correlated, and proteins interact with each other when performing crucial functions. We propose a data cleaning approach, based on modeling data dependencies with Markov networks. Belief propagation is used to efficiently compute the marginal or maximum posterior probabilities, so as to infer missing values or to correct errors. To illustrate the benefits and generality of the technique, we discuss its use in several applications and report on the data quality and improvements so obtained.