Building a cross-language entity linking collection in twenty-one languages

  • Authors:
  • James Mayfield;Dawn Lawrie;Paul McNamee;Douglas W. Oard

  • Affiliations:
  • Johns Hopkins University Human Language Technology Center of Excellence;Johns Hopkins University Human Language Technology Center of Excellence and Loyola University Maryland;Johns Hopkins University Human Language Technology Center of Excellence;Johns Hopkins University Human Language Technology Center of Excellence and University of Maryland, College Park

  • Venue:
  • CLEF'11 Proceedings of the Second international conference on Multilingual and multimodal information access evaluation
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe an efficient way to create a test collection for evaluating the accuracy of cross-language entity linking. Queries are created by semiautomatically identifying person names on the English side of a parallel corpus, using judgments obtained through crowdsourcing to identify the entity corresponding to the name, and projecting the English name onto the non-English document using word alignments. We applied the technique to produce the first publicly available multilingual cross-language entity linking collection. The collection includes approximately 55,000 queries, comprising between 875 and 4,329 queries for each of twenty-one non-English languages.