The challenge of virginia banks: an evaluation of named entity analysis in a 19th-century newspaper collection

  • Authors:
  • Gregory Crane;Alison Jones

  • Affiliations:
  • Tufts University, Medford, MA;Tufts University, Medford, MA

  • Venue:
  • Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper evaluates automatic extraction of ten named entity classes from a 19th century newspaper, the Civil War years of the Richmond Times Dispatch, digitized with IMLS support by the University of Richmond. This paper analyzes success with ten categories of entities prominent in these newspapers and the particular problems that these classes of named entities raise. Personal and place names are familiar but some more important categories (such as ship names and military units) illustrate some of the challenges that named entity identification confronts as it evolves into a fundamental tool not only for automatic metadata generation but also for searching and browsing as well. We conclude by suggesting the kinds of knowledge sources that digital libraries need to assemble as part of their machine readable reference collections to support named entity identification as a core service.