Learning-based entity resolution with MapReduce

  • Authors:
  • Lars Kolb;Hanna Köpcke;Andreas Thor;Erhard Rahm

  • Affiliations:
  • University of Leipzig, Leipzig, Germany;University of Leipzig, Leipzig, Germany;University of Leipzig, Leipzig, Germany;University of Leipzig, Leipzig, Germany

  • Venue:
  • Proceedings of the third international workshop on Cloud data management
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Entity resolution is a crucial step for data quality and data integration. Learning-based approaches show high effectiveness at the expense of poor efficiency. To reduce the typically high execution times, we investigate how learning-based entity resolution can be realized in a cloud infrastructure using MapReduce. We propose and evaluate two efficient MapReduce-based strategies for pair-wise similarity computation and classifier application on the Cartesian product of two input sources. Our evaluation is based on real-world datasets and shows the high efficiency and effectiveness of the proposed approaches.