Reasoning about record matching rules

Authors:
Wenfei Fan;Xibei Jia;Jianzhong Li;Shuai Ma
Affiliations:
University of Edinburgh and Bell Laboratories and Harbin Institute of Technology;University of Edinburgh;Harbin Institute of Technology;University of Edinburgh
Venue:
Proceedings of the VLDB Endowment
Year:
2009

Citing 25
Cited 22

The merge/purge problem for large databases

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Entity identification in database integration

Information Sciences: an International Journal
Computational problems related to the design of normal form relational schemas

ACM Transactions on Database Systems (TODS)
Automating the approximate record-matching process

Information Sciences—Informatics and Computer Science: An International Journal
Foundations of Databases: The Logical Level

Foundations of Databases: The Logical Level
Declarative Data Cleaning: Language, Model, and Algorithms

Proceedings of the 27th International Conference on Very Large Data Bases
Interactive deduplication using active learning

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to match and cluster large high-dimensional data sets for data integration

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
iMAP: discovering complex semantic matches between database schemas

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Methods for evaluating and creating data quality

Information Systems - Special issue: Data quality in cooperative information systems
Theory of Relational Databases

Theory of Relational Databases
Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications)

Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications)
Duplicate Record Detection: A Survey

IEEE Transactions on Knowledge and Data Engineering
Leveraging aggregate constraints for deduplication

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Eliminating fuzzy duplicates in data warehouses

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Merging the results of approximate match operations

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Example-driven design of efficient record matching queries

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Dependencies revisited for improving data quality

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Industry-scale duplicate detection

Proceedings of the VLDB Endowment
Transformation-based Framework for Record Matching

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Large-Scale Deduplication with Constraints Using Dedupalog

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Metric Functional Dependencies

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Constraint-based entity matching

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Data tables with similarity relations: functional dependencies, complete rules and non-redundant bases

DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
Object identification with attribute-mediated dependences

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases

Discovering matching dependencies

Proceedings of the 18th ACM conference on Information and knowledge management
Towards certain fixes with editing rules and master data

Proceedings of the VLDB Endowment
Record linkage with uniqueness constraints and erroneous values

Proceedings of the VLDB Endowment
Data cleaning and query answering with matching dependencies and matching functions

Proceedings of the 14th International Conference on Database Theory
Guided data repair

Proceedings of the VLDB Endowment
Matching dependencies with arbitrary attribute values: semantics, query answering and integrity constraints

Proceedings of the 4th International Workshop on Logic in Databases
Polymorphic queries for P2P systems

Information Systems
Interaction between record matching and data repairing

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
We challenge you to certify your updates

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Differential dependencies: Reasoning and discovery

ACM Transactions on Database Systems (TODS)
Entity matching: how similar is similar

Proceedings of the VLDB Endowment
Dynamic constraints for record matching

The VLDB Journal — The International Journal on Very Large Data Bases
Scalable and distributed methods for entity matching, consolidation and disambiguation over linked data corpora

Web Semantics: Science, Services and Agents on the World Wide Web
Tractable cases of clean query answering under entity resolution via matching dependencies

SUM'12 Proceedings of the 6th international conference on Scalable Uncertainty Management
Query rewriting using datalog for duplicate resolution

Datalog 2.0'12 Proceedings of the Second international conference on Datalog in Academia and Industry
The data analytics group at the qatar computing research institute

ACM SIGMOD Record
Comparable dependencies over heterogeneous data

The VLDB Journal — The International Journal on Very Large Data Bases
NADEEF: a commodity data cleaning system

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Big data challenge: a data management perspective

Frontiers of Computer Science: Selected Publications from Chinese Universities
Editorial: Efficient discovery of similarity constraints for matching dependencies

Data & Knowledge Engineering
Query-driven approach to entity resolution

Proceedings of the VLDB Endowment
Policy-based inconsistency management in relational databases

International Journal of Approximate Reasoning

Quantified Score

Hi-index	0.00

Visualization

Abstract

To accurately match records it is often necessary to utilize the semantics of the data. Functional dependencies (FDs) have proven useful in identifying tuples in a clean relation, based on the semantics of the data. For all the reasons that FDs and their inference are needed, it is also important to develop dependencies and their reasoning techniques for matching tuples from unreliable data sources. This paper investigates dependencies and their reasoning for record matching. (a) We introduce a class of matching dependencies (MDs) for specifying the semantics of data in unreliable relations, defined in terms of similarity metrics and a dynamic semantics. (b) We identify a special case of MDs, referred to as relative candidate keys (RCKs), to determine what attributes to compare and how to compare them when matching records across possibly different relations. (c) We propose a mechanism for inferring MDs, a departure from traditional implication analysis, such that when we cannot match records by comparing attributes that contain errors, we may still find matches by using other, more reliable attributes. (d) We provide an O(n2) time algorithm for inferring MDs, and an effective algorithm for deducing a set of RCKs from MDs. (e) We experimentally verify that the algorithms help matching tools efficiently identify keys at compile time for matching, blocking or windowing, and that the techniques effectively improve both the quality and efficiency of various record matching methods.