The challenge of virginia banks: an evaluation of named entity analysis in a 19th-century newspaper collection

Authors:
Gregory Crane;Alison Jones
Affiliations:
Tufts University, Medford, MA;Tufts University, Medford, MA
Venue:
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Year:
2006

Citing 13
Cited 7

Metadata and data structures for the historical newspaper digital library

Proceedings of the eighth international conference on Information and knowledge management
Detecting events with date and place information in unstructured text

Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Detecting and Browsing Events in Unstructured text

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic Ontology-Based Knowledge Extraction from Web Documents

IEEE Intelligent Systems
Using Human Language Technology for Automatic Annotation and Indexing of Digital Library Content

ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
Searching for images: the analysis of users' queries for image retrieval in American history

Journal of the American Society for Information Science and Technology
A query interface for an event gazetteer

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Text mining in a digital library

International Journal on Digital Libraries
Evolving GATE to meet new challenges in language engineering

Natural Language Engineering
A focus-context browser for multiple timelines

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Toward a metadata standard for digitized historical newspapers

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Building a hierarchy of events and topics for newspaper digital libraries

ECIR'03 Proceedings of the 25th European conference on IR research
Information seeking by humanities scholars

ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries

A new generation of textual corpora: mining corpora from very large collections

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Improving historical research by linking digital library information to a global genealogical database

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
A framework for text processing and supporting access to collections of digitized historical newspapers

Proceedings of the 2007 conference on Human interface: Part II
Transferring structural markup across translations using multilingual alignment and projection

Proceedings of the 10th annual joint conference on Digital libraries
Named entity identification and cyberinfrastructure

ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
Structured named entities in two distinct press corpora: contemporary broadcast news and old newspapers

LAW VI '12 Proceedings of the Sixth Linguistic Annotation Workshop
FS-NER: a lightweight filter-stream approach to named entity recognition on twitter data

Proceedings of the 22nd international conference on World Wide Web companion

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper evaluates automatic extraction of ten named entity classes from a 19th century newspaper, the Civil War years of the Richmond Times Dispatch, digitized with IMLS support by the University of Richmond. This paper analyzes success with ten categories of entities prominent in these newspapers and the particular problems that these classes of named entities raise. Personal and place names are familiar but some more important categories (such as ship names and military units) illustrate some of the challenges that named entity identification confronts as it evolves into a fundamental tool not only for automatic metadata generation but also for searching and browsing as well. We conclude by suggesting the kinds of knowledge sources that digital libraries need to assemble as part of their machine readable reference collections to support named entity identification as a core service.