Adaptive Connection Strength Models for Relationship-Based Entity Resolution

  • Authors:
  • Rabia Nuray-Turan;Dmitri V. Kalashnikov;Sharad Mehrotra

  • Affiliations:
  • University of California, Irvine;University of California, Irvine;University of California, Irvine

  • Venue:
  • Journal of Data and Information Quality (JDIQ) - Special Issue on Entity Resolution
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Entity Resolution (ER) is a data quality challenge that deals with ambiguous references in data and whose task is to identify all references that co-refer. Due to practical significance of the ER problem, many creative ER techniques have been proposed in the past, including those that analyze relationships that exist among entities in data. Such approaches view the database as an entity-relationship graph, where direct and indirect relationships correspond to paths in the graph. These techniques rely on measuring the connection strength among various nodes in the graph by using a connection strength (CS) model. While such approaches have demonstrated significant advantage over traditional ER techniques, currently they also have a significant limitation: the CS models that they use are intuition-based fixed models that tend to behave well in general, but are very generic and not tuned to a specific domain, leading to suboptimal result quality. Hence, in this article we propose an approach that employs supervised learning to adapt the connection strength measure to the given domain using the available past/training data. The adaptive approach has several advantages: it increases both the quality and efficiency of ER and it also minimizes the domain analyst participation needed to tune the CS model to the given domain. The extensive empirical evaluation demonstrates that the proposed approach reaches up to 8% higher accuracy than the graph-based ER methods that use fixed and intuition-based CS models.