IdentityRank: Named entity disambiguation in the news domain

  • Authors:
  • Norberto Fernández;Jesús Arias Fisteus;Luis Sánchez;Gonzalo López

  • Affiliations:
  • Telematics Engineering Department, Carlos III University of Madrid, Universidad 30, E-28911 Leganés, Madrid, Spain;Telematics Engineering Department, Carlos III University of Madrid, Universidad 30, E-28911 Leganés, Madrid, Spain;Telematics Engineering Department, Carlos III University of Madrid, Universidad 30, E-28911 Leganés, Madrid, Spain;Telematics Engineering Department, Carlos III University of Madrid, Universidad 30, E-28911 Leganés, Madrid, Spain

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2012

Quantified Score

Hi-index 12.05

Visualization

Abstract

News companies produce news items that describe events that happen in the world. These news items usually contain mentions to persons, organizations, locations and other types of named entities that are involved in the events being described. These named entities may have an ambiguous meaning, which impacts the performance of free-text information retrieval systems. In this paper the IdentityRank algorithm, designed to address the problem of named entity disambiguation in news items, is described. It has been developed as part of the EU-funded project News Engine Web Services (NEWS) and is specifically designed to operate within the editorial environment of a news company. The algorithm was implemented and evaluated using several corpora of actual news items, achieving an average accuracy of around 96%.