Dynamic functional dependencies and database aging
Journal of the ACM (JACM)
The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Entity identification in database integration
Information Sciences: an International Journal
Computational problems related to the design of normal form relational schemas
ACM Transactions on Database Systems (TODS)
Automating the approximate record-matching process
Information Sciences—Informatics and Computer Science: An International Journal
Foundations of Databases: The Logical Level
Foundations of Databases: The Logical Level
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem
Data Mining and Knowledge Discovery
Declarative Data Cleaning: Language, Model, and Algorithms
Proceedings of the 27th International Conference on Very Large Data Bases
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Interactive deduplication using active learning
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to match and cluster large high-dimensional data sets for data integration
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
iMAP: discovering complex semantic matches between database schemas
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Methods for evaluating and creating data quality
Information Systems - Special issue: Data quality in cooperative information systems
DogmatiX tracks down duplicates in XML
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Clio grows up: from research prototype to industrial tool
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Schema and ontology matching with COMA++
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Theory of Relational Databases
Theory of Relational Databases
GORDIAN: efficient and scalable discovery of composite keys
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications)
Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications)
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Leveraging aggregate constraints for deduplication
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Reasoning about XML update constraints
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Eliminating fuzzy duplicates in data warehouses
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Merging the results of approximate match operations
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Example-driven design of efficient record matching queries
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Dependencies revisited for improving data quality
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Industry-scale duplicate detection
Proceedings of the VLDB Endowment
Master Data Management
Transformation-based Framework for Record Matching
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Large-Scale Deduplication with Constraints Using Dedupalog
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Metric Functional Dependencies
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Constraint-based entity matching
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Reasoning about record matching rules
Proceedings of the VLDB Endowment
Discovering Conditional Functional Dependencies
IEEE Transactions on Knowledge and Data Engineering
DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
Object identification with attribute-mediated dependences
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Towards certain fixes with editing rules and master data
The VLDB Journal — The International Journal on Very Large Data Bases
Leveraging matching dependencies for guided user feedback in linked data applications
Proceedings of the Ninth International Workshop on Information Integration on the Web
Exploiting evidence from unstructured data to enhance master data management
Proceedings of the VLDB Endowment
Editorial: Efficient discovery of similarity constraints for matching dependencies
Data & Knowledge Engineering
The LLUNATIC data-cleaning framework
Proceedings of the VLDB Endowment
Extending inclusion dependencies with conditions
Theoretical Computer Science
Hi-index | 0.00 |
This paper investigates constraints for matching records from unreliable data sources. (a) We introduce a class of matching dependencies (mds) for specifying the semantics of unreliable data. As opposed to static constraints for schema design, mds are developed for record matching, and are defined in terms of similarity predicates and a dynamic semantics. (b) We identify a special case of mds, referred to as relative candidate keys (rcks), to determine what attributes to compare and how to compare them when matching records across possibly different relations. (c) We propose a mechanism for inferring mds, a departure from traditional implication analysis, such that when we cannot match records by comparing attributes that contain errors, we may still find matches by using other, more reliable attributes. Moreover, we develop a sound and complete system for inferring mds. (d) We provide a quadratic-time algorithm for inferring mds and an effective algorithm for deducing a set of high-quality rcks from mds. (e) We experimentally verify that the algorithms help matching tools efficiently identify keys at compile time for matching, blocking or windowing and in addition, that the md-based techniques effectively improve the quality and efficiency of various record matching methods.