The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Entity identification in database integration
Information Sciences: an International Journal
Computational problems related to the design of normal form relational schemas
ACM Transactions on Database Systems (TODS)
Automating the approximate record-matching process
Information Sciences—Informatics and Computer Science: An International Journal
Foundations of Databases: The Logical Level
Foundations of Databases: The Logical Level
Declarative Data Cleaning: Language, Model, and Algorithms
Proceedings of the 27th International Conference on Very Large Data Bases
Interactive deduplication using active learning
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to match and cluster large high-dimensional data sets for data integration
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
iMAP: discovering complex semantic matches between database schemas
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Methods for evaluating and creating data quality
Information Systems - Special issue: Data quality in cooperative information systems
Theory of Relational Databases
Theory of Relational Databases
Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications)
Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications)
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Leveraging aggregate constraints for deduplication
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Eliminating fuzzy duplicates in data warehouses
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Merging the results of approximate match operations
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Example-driven design of efficient record matching queries
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Dependencies revisited for improving data quality
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Industry-scale duplicate detection
Proceedings of the VLDB Endowment
Transformation-based Framework for Record Matching
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Large-Scale Deduplication with Constraints Using Dedupalog
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Metric Functional Dependencies
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Constraint-based entity matching
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
Object identification with attribute-mediated dependences
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Discovering matching dependencies
Proceedings of the 18th ACM conference on Information and knowledge management
Towards certain fixes with editing rules and master data
Proceedings of the VLDB Endowment
Record linkage with uniqueness constraints and erroneous values
Proceedings of the VLDB Endowment
Data cleaning and query answering with matching dependencies and matching functions
Proceedings of the 14th International Conference on Database Theory
Proceedings of the VLDB Endowment
Proceedings of the 4th International Workshop on Logic in Databases
Polymorphic queries for P2P systems
Information Systems
Interaction between record matching and data repairing
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
We challenge you to certify your updates
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Differential dependencies: Reasoning and discovery
ACM Transactions on Database Systems (TODS)
Entity matching: how similar is similar
Proceedings of the VLDB Endowment
Dynamic constraints for record matching
The VLDB Journal — The International Journal on Very Large Data Bases
Web Semantics: Science, Services and Agents on the World Wide Web
Tractable cases of clean query answering under entity resolution via matching dependencies
SUM'12 Proceedings of the 6th international conference on Scalable Uncertainty Management
Query rewriting using datalog for duplicate resolution
Datalog 2.0'12 Proceedings of the Second international conference on Datalog in Academia and Industry
The data analytics group at the qatar computing research institute
ACM SIGMOD Record
Comparable dependencies over heterogeneous data
The VLDB Journal — The International Journal on Very Large Data Bases
NADEEF: a commodity data cleaning system
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Big data challenge: a data management perspective
Frontiers of Computer Science: Selected Publications from Chinese Universities
Editorial: Efficient discovery of similarity constraints for matching dependencies
Data & Knowledge Engineering
Query-driven approach to entity resolution
Proceedings of the VLDB Endowment
Policy-based inconsistency management in relational databases
International Journal of Approximate Reasoning
Hi-index | 0.00 |
To accurately match records it is often necessary to utilize the semantics of the data. Functional dependencies (FDs) have proven useful in identifying tuples in a clean relation, based on the semantics of the data. For all the reasons that FDs and their inference are needed, it is also important to develop dependencies and their reasoning techniques for matching tuples from unreliable data sources. This paper investigates dependencies and their reasoning for record matching. (a) We introduce a class of matching dependencies (MDs) for specifying the semantics of data in unreliable relations, defined in terms of similarity metrics and a dynamic semantics. (b) We identify a special case of MDs, referred to as relative candidate keys (RCKs), to determine what attributes to compare and how to compare them when matching records across possibly different relations. (c) We propose a mechanism for inferring MDs, a departure from traditional implication analysis, such that when we cannot match records by comparing attributes that contain errors, we may still find matches by using other, more reliable attributes. (d) We provide an O(n2) time algorithm for inferring MDs, and an effective algorithm for deducing a set of RCKs from MDs. (e) We experimentally verify that the algorithms help matching tools efficiently identify keys at compile time for matching, blocking or windowing, and that the techniques effectively improve both the quality and efficiency of various record matching methods.