Generic Entity Resolution in Relational Databases

  • Authors:
  • Csaba István Sidló

  • Affiliations:
  • Data Mining and Web Search Research Group, Informatics Laboratory Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary 1111

  • Venue:
  • ADBIS '09 Proceedings of the 13th East European Conference on Advances in Databases and Information Systems
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Entity Resolution (ER) covers the problem of identifying distinct representations of real-world entities in heterogeneous databases. We consider the generic formulation of ER problems (GER) with exact outcome. In practice, input data usually resides in relational databases and can grow to huge volumes. Yet, typical solutions described in the literature employ standalone memory resident algorithms. In this paper we utilize facilities of standard, unmodified relational database management systems (RDBMS) to enhance the efficiency of GER algorithms. We study and revise the problem formulation, and propose practical and efficient algorithms optimized for RDBMS external memory processing. We outline a real-world scenario and demonstrate the advantage of algorithms by performing experiments on insurance customer data.