Proximity-based document representation for named entity retrieval

  • Authors:
  • Desislava Petkova;W. Bruce Croft

  • Affiliations:
  • University of Massachusetts, Amherst, MA;University of Massachusetts, Amherst, MA

  • Venue:
  • Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

One aspect in which retrieving named entities is different from retrieving documents is that the items to be retrieved - persons, locations, organizations - are only indirectly described by documents throughout the collection. Much work has been dedicated to finding references to named entities, in particular to the problems of named entity extraction and disambiguation. However, just as important for retrieval performance is how these snippets of text are combined to build named entity representations. We focus on the TREC expert search task where the goal is to identify people who are knowledgeable on a specific topic. Existing language modeling techniques for expert finding assume that terms and person entities are conditionally independent given a document. We present theoretical and experimental evidence that this simplifying assumption ignores information on how named entities relate to document content. To address this issue, we propose a new document representation which emphasizes text in proximity to entities and thus incorporates sequential information implicit in text. Our experiments demonstrate that the proposed model significantly improves retrieval performance. The main contribution of this work is an effective formal method for explicitly modeling the dependency between the named entities and terms which appear in a document.