An analysis of the named entity recognition problem in digital library metadata

Authors:
Nuno Freire;José Borbinha;Pável Calado
Affiliations:
IST/INESC-ID, Lisbon, Portugal;IST/INESC-ID, Lisbon, Portugal;IST/INESC-ID, Lisbon, Portugal
Venue:
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Year:
2012

Citing 8
Cited 0

Maximum Entropy Markov Models for Information Extraction and Segmentation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Message Understanding Conference-6: a brief history

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Sequential conditional Generalized Iterative Scaling

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Incorporating non-local information into information extraction systems by Gibbs sampling

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Geographically-aware information retrieval for collections of digitized historical maps

Proceedings of the 4th ACM workshop on Geographical information retrieval
Information Extraction

Foundations and Trends in Databases
A metadata geoparsing system for place name recognition and resolution in metadata records

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

Information resources in digital libraries are usually described, along with their context, by structured data records, commonly referred as metadata. Those records often contain unstructured information in natural language text, since they typically follow a data model which defines generic semantics for its data elements, or includes data elements modeled to contain free text. The information contained in these data elements, although machine readable, resides in unstructured natural language texts that are difficult to process by computers. This paper addresses a particular task of information extraction, typically called named entity recognition, which deals with the references to entities made by names occurring in the texts. This paper presents the results of a study of how the named entity recognition problem manifests itself in digital library metadata. In particular, we present the main differences between performing named entity recognition in natural language and in the text within metadata. The paper finalizes with a novel approach for named entity recognition in metadata.