Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive Name Matching in Information Integration
IEEE Intelligent Systems
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Dependencies revisited for improving data quality
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On generating near-optimal tableaux for conditional functional dependencies
Proceedings of the VLDB Endowment
Discovering data quality rules
Proceedings of the VLDB Endowment
Metric Functional Dependencies
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Reasoning about record matching rules
Proceedings of the VLDB Endowment
Interaction between record matching and data repairing
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Checking enforcement of integrity constraints in database applications based on code patterns
Journal of Systems and Software
Towards certain fixes with editing rules and master data
The VLDB Journal — The International Journal on Very Large Data Bases
Editorial: Efficient discovery of similarity constraints for matching dependencies
Data & Knowledge Engineering
Hi-index | 0.00 |
Matching dependencies (MDs) are recently proposed for various data quality applications such as detecting the violation of integrity constraints and duplicate object identification. In this paper, we study the problem of discovering matching dependencies for a given database instance. First, we formally define the measures, support and confidence, for evaluating the utility of MDs in the given database instance. Then, we study the discovery of MDs with certain utility requirements of support and confidence. Exact algorithms are developed, together with pruning strategies to improve the time performance. Finally, our experimental evaluation demonstrates the efficiency of the proposed methods.