A joint model for discovering and linking entities

  • Authors:
  • Michael Wick;Sameer Singh;Harshal Pandya;Andrew McCallum

  • Affiliations:
  • University of Massachusetts, Amherst, MA, USA;University of Massachusetts, Amherst, MA, USA;University of Massachusetts, Amherst, MA, USA;University of Massachusetts, Amherst, MA, USA

  • Venue:
  • Proceedings of the 2013 workshop on Automated knowledge base construction
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Entity resolution, the task of automatically determining which mentions refer to the same real-world entity, is a crucial aspect of knowledge base construction and management. However, performing entity resolution at large scales is challenging because (1) the inference algorithms must cope with unavoidable system scalability issues and (2) the search space grows exponentially in the number of mentions. Current conventional wisdom has been that performing coreference at these scales requires decomposing the problem by first solving the simpler task of entity-linking (matching a set of mentions to a known set of KB entities), and then performing entity discovery as a post-processing step (to identify new entities not present in the KB). However, we argue that this traditional approach is harmful to both entity-linking and overall coreference accuracy. Therefore, we embrace the challenge of jointly modeling entity-linking and entity-discovery as a single entity resolution problem. In order to make progress towards scalability we (1) present a model that reasons over compact hierarchical entity representations, and (2) propose a novel distributed inference architecture that does not suffer from the synchronicity bottleneck which is inherent in map-reduce architectures. We demonstrate that more test-time data actually improves the accuracy of coreference, and show that joint coreference is substantially more accurate than traditional entity-linking, reducing error by 75%.