Nested Named Entity Recognition in Historical Archive Text

Authors:
Kate Byrne
Affiliations:
University of Edinburgh, UK
Venue:
ICSC '07 Proceedings of the International Conference on Semantic Computing
Year:
2007

Citing 0
Cited 6

Nested named entity recognition

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
A novel approach to automatic gazetteer generation using Wikipedia

People's Web '09 Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
A methodology towards effective and efficient manual document annotation: addressing annotator discrepancy and annotation quality

EKAW'10 Proceedings of the 17th international conference on Knowledge engineering and management by the masses
A distributional semantics approach to simultaneous recognition of multiple classes of named entities

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Structured named entities in two distinct press corpora: contemporary broadcast news and old newspapers

LAW VI '12 Proceedings of the Sixth Linguistic Annotation Workshop
A pilot investigation of information extraction in the semantic annotation of archaeological reports

International Journal of Metadata, Semantics and Ontologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes work on Named Entity Recognition (NER), in preparation for Relation Extraction (RE), on data from a historical archive organisation. As is often the case in the cultural heritage domain, the source text includes a high percentage of specialist terminology, and is of very variable quality in terms of grammaticality and completeness. The NER and RE tasks were carried out using a specially annotated corpus, and are themselves preliminary steps in a larger project whose aim is to transform discovered relations into a graph structure that can be queried using standard tools. Experimental results from the NER task are described, with emphasis on dealing with nested entities using a multi-word token method. The overall objective is to improve access by non-specialist users to a valuable cultural resource.