Structured named entities in two distinct press corpora: contemporary broadcast news and old newspapers

Authors:
Sophie Rosset;Cyril Grouin;Karën Fort;Olivier Galibert;Juliette Kahn;Pierre Zweigenbaum
Affiliations:
LIMSI-CNRS, France;LIMSI-CNRS, France;INIST-CNRS, France and LIPN, France;LNE, France;LNE, France;LIMSI-CNRS, France
Venue:
LAW VI '12 Proceedings of the Sixth Linguistic Annotation Workshop
Year:
2012

Citing 7
Cited 0

Named entity extraction from noisy input: speech and OCR

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Message Understanding Conference-6: a brief history

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Fine grained classification of named entities

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
The challenge of virginia banks: an evaluation of named entity analysis in a 19th-century newspaper collection

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Nested Named Entity Recognition in Historical Archive Text

ICSC '07 Proceedings of the International Conference on Semantic Computing
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Proposal for an extension of traditional named entities: from guidelines to evaluation, an overview

LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper compares the reference annotation of structured named entities in two corpora with different origins and properties. It addresses two questions linked to such a comparison. On the one hand, what specific issues were raised by reusing the same annotation scheme on a corpus that differs from the first in terms of media and that predates it by more than a century? On the other hand, what contrasts were observed in the resulting annotations across the two corpora?