AJAX: an extensible data cleaning tool
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Learning and making decisions when costs and probabilities are both unknown
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Machine Learning
Potter's Wheel: An Interactive Data Cleaning System
Proceedings of the 27th International Conference on Very Large Data Bases
Artificial Intelligence: A Modern Approach
Artificial Intelligence: A Modern Approach
Interactive deduplication using active learning
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Support vector machine active learning with applications to text classification
The Journal of Machine Learning Research
A cost-based model and effective heuristic for repairing constraints by value modification
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications)
Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications)
Extending dependencies with conditions
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Improving data quality: consistency and accuracy
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Pay-as-you-go user feedback for dataspace systems
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Dependencies revisited for improving data quality
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On generating near-optimal tableaux for conditional functional dependencies
Proceedings of the VLDB Endowment
Discovering Conditional Functional Dependencies
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Selective supervision: guiding supervised learning with decision-theoretic active learning
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Reasoning about record matching rules
Proceedings of the VLDB Endowment
Minimal-change integrity maintenance using tuple deletions
Information and Computation
GDR: a system for guided data repair
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Towards certain fixes with editing rules and master data
Proceedings of the VLDB Endowment
Support for user involvement in data cleaning
DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
Towards certain fixes with editing rules and master data
The VLDB Journal — The International Journal on Very Large Data Bases
Leveraging matching dependencies for guided user feedback in linked data applications
Proceedings of the Ninth International Workshop on Information Integration on the Web
The data analytics group at the qatar computing research institute
ACM SIGMOD Record
Determining the value of information for collaborative multi-agent planning
Autonomous Agents and Multi-Agent Systems
Actively soliciting feedback for query answers in keyword search-based data integration
Proceedings of the VLDB Endowment
Don't be SCAREd: use SCalable Automatic REpairing with maximal likelihood and bounded changes
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
NADEEF: a commodity data cleaning system
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Big data challenge: a data management perspective
Frontiers of Computer Science: Selected Publications from Chinese Universities
NADEEF: a generalized data cleaning system
Proceedings of the VLDB Endowment
The LLUNATIC data-cleaning framework
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
In this paper we present GDR, a Guided Data Repair framework that incorporates user feedback in the cleaning process to enhance and accelerate existing automatic repair techniques while minimizing user involvement. GDR consults the user on the updates that are most likely to be beneficial in improving data quality. GDR also uses machine learning methods to identify and apply the correct updates directly to the database without the actual involvement of the user on these specific updates. To rank potential updates for consultation by the user, we first group these repairs and quantify the utility of each group using the decision-theory concept of value of information (VOI). We then apply active learning to order updates within a group based on their ability to improve the learned model. User feedback is used to repair the database and to adaptively refine the training set for the model. We empirically evaluate GDR on a real-world dataset and show significant improvement in data quality using our user guided repairing process. We also, assess the trade-off between the user efforts and the resulting data quality.