A Distance-Based Approach to Entity Reconciliation in Heterogeneous Databases
IEEE Transactions on Knowledge and Data Engineering
Entity identification of fuzzy multidatabase systems with incompatible keys
Web-enabled systems integration
The Catch data warehouse: support for community health care decision-making
Decision Support Systems
Instance-based attribute identification in database integration
The VLDB Journal — The International Journal on Very Large Data Bases
Element matching across data-oriented XML sources using a multi-strategy clustering model
Data & Knowledge Engineering
Secure and useful data sharing
Decision Support Systems
Combining schema and instance information for integrating heterogeneous data sources
Data & Knowledge Engineering
Entity matching in heterogeneous databases: A logistic regression approach
Decision Support Systems
Data & Knowledge Engineering
Combining a Logical and a Numerical Method for Data Reconciliation
Journal on Data Semantics XII
A self-learning framework for services selection
International Journal of Information Technology and Management
Identity matching using personal and social identity features
Information Systems Frontiers
Design science in information systems research
MIS Quarterly
Usercentric Operational Decision Making in Distributed Information Retrieval
Information Systems Research
Data Quality of Query Results with Generalized Selection Conditions
Operations Research
Identity matching and information acquisition: Estimation of optimal threshold parameters
Decision Support Systems
Hi-index | 0.01 |
In recent years, there has been a proliferation of database systems in all types of organizations. In many cases, these databases are developed in different departments and maintained autonomously. Much is to be gained, however, if databases across departments, divisions, or even organizations can be related to one another. One main problem of relating data stored in different databases is the differences in their representation of real-world entities, such as the use of different identifiers or primary keys. We present a decision theoretic model for matching entities across different databases. The decision to match two entities from two different databases inherently involves some uncertainty since an exact match may not be found because of errors in data collection, data entry, and data representation. We model this uncertainty using probability theory and propose an integer programming formulation that minimizes the total cost associated with the entity matching decision. The model has been implemented and validated on real-world data.