Evaluation of entity resolution approaches on real-world match problems

  • Authors:
  • Hanna Köpcke;Andreas Thor;Erhard Rahm

  • Affiliations:
  • University of Leipzig, Germany;University of Leipzig, Germany;University of Leipzig, Germany

  • Venue:
  • Proceedings of the VLDB Endowment
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

Despite the huge amount of recent research efforts on entity resolution (matching) there has not yet been a comparative evaluation on the relative effectiveness and efficiency of alternate approaches. We therefore present such an evaluation of existing implementations on challenging real-world match tasks. We consider approaches both with and without using machine learning to find suitable parameterization and combination of similarity functions. In addition to approaches from the research community we also consider a state-of-the-art commercial entity resolution implementation. Our results indicate significant quality and efficiency differences between different approaches. We also find that some challenging resolution tasks such as matching product entities from online shops are not sufficiently solved with conventional approaches based on the similarity of attribute values.