Probabilistic reasoning in intelligent systems: networks of plausible inference
Probabilistic reasoning in intelligent systems: networks of plausible inference
Exploratory Data Mining and Data Cleaning
Exploratory Data Mining and Data Cleaning
Dependency networks for inference, collaborative filtering, and data visualization
The Journal of Machine Learning Research
Probabilistic Noise Identification and Data Cleaning
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Measures of distributional similarity
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
TinyDB: an acquisitional query processing system for sensor networks
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
A cost-based model and effective heuristic for repairing constraints by value modification
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Data cleaning using belief propagation
Proceedings of the 2nd international workshop on Information quality in information systems
Clean Answers over Dirty Databases: A Probabilistic Approach
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
Relational Dependency Networks
The Journal of Machine Learning Research
Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning)
Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning)
Improving data quality: consistency and accuracy
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Querying continuous functions in a database system
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A revival of integrity constraints for data cleaning
Proceedings of the VLDB Endowment
MAD skills: new analysis practices for big data
Proceedings of the VLDB Endowment
Interaction between record matching and data repairing
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Functional dependency discovery via Bayes net analysis
MAMECTIS/NOLASC/CONTROL/WAMUS'11 Proceedings of the 13th WSEAS international conference on mathematical methods, computational techniques and intelligent systems, and 10th WSEAS international conference on non-linear analysis, non-linear systems and chaos, and 7th WSEAS international conference on dynamical systems and control, and 11th WSEAS international conference on Wavelet analysis and multirate systems: recent researches in computational techniques, non-linear systems and control
Relational approach for shortest path discovery over large graphs
Proceedings of the VLDB Endowment
Statistical distortion: consequences of data cleaning
Proceedings of the VLDB Endowment
Xtream: a system for continuous querying over uncertain data streams
SUM'12 Proceedings of the 6th international conference on Scalable Uncertainty Management
Don't be SCAREd: use SCalable Automatic REpairing with maximal likelihood and bounded changes
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
NADEEF: a commodity data cleaning system
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Probabilistic graph summarization
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Hi-index | 0.00 |
Real-world databases often contain syntactic and semantic errors, in spite of integrity constraints and other safety measures incorporated into modern DBMSs. We present ERACER, an iterative statistical framework for inferring missing information and correcting such errors automatically. Our approach is based on belief propagation and relational dependency networks, and includes an efficient approximate inference algorithm that is easily implemented in standard DBMSs using SQL and user defined functions. The system performs the inference and cleansing tasks in an integrated manner, using shrinkage techniques to infer correct values accurately even in the presence of dirty data. We evaluate the proposed methods empirically on multiple synthetic and real-world data sets. The results show that our framework achieves accuracy comparable to a baseline statistical method using Bayesian networks with exact inference. However, our framework has wider applicability than the Bayesian network baseline, due to its ability to reason with complex, cyclic relational dependencies.