Efficient entity resolution methods for heterogeneous information spaces

  • Authors:
  • George Papadakis;Wolfgang Nejdl

  • Affiliations:
  • L3S Research Center, Leibniz Universität Hannover, Appelstr. 9A, Germany;L3S Research Center, Leibniz Universität Hannover, Appelstr. 9A, Germany

  • Venue:
  • ICDEW '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering Workshops
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Web of Data encompasses a voluminous, yet constantly expanding collection of structured and semi-structured data sets. An important prerequisite for leveraging on them is the detection (and merge) of information that describe the same real-world entities, a task known as Entity Resolution. To enhance the efficiency of this quadratic task, blocking techniques are typically employed. They are, however, inapplicable to the Web of Data, due to the noise, the loose schema binding as well as the unprecedented heterogeneity inherent in it. In the context of my thesis, I focus on developing novel blocking methods that scale up Entity Resolution within such large, noisy, and heterogeneous information spaces. At their core lies an attribute-agnostic mechanism that relies exclusively on the values of entity profiles in order to build blocks effectively. The resulting set of blocks is processed efficiently by intelligent techniques that minimize the required number of comparisons. Any combination of block building and block processing methods is possible, allowing for high flexibility of the overall approach. Initial experimental studies on large, real-world data sets have produced quite promising results.