Semi-supervised named entity recognition: learning to recognize 100 entity types with little supervision

Authors:
David Nadeau
Affiliations:
University of Ottawa (Canada)
Venue:
Semi-supervised named entity recognition: learning to recognize 100 entity types with little supervision
Year:
2007

Citing 0
Cited 10

VIBES: visualizing changing emotional states in personal stories

SRMC '08 Proceedings of the 2nd ACM international workshop on Story representation, mechanism and context
A methodology towards effective and efficient manual document annotation: addressing annotator discrepancy and annotation quality

EKAW'10 Proceedings of the 17th international conference on Knowledge engineering and management by the masses
Creating knowledge out of interlinked data: making the web a data washing machine

Proceedings of the International Conference on Web Intelligence, Mining and Semantics
Introduction to linked data and its lifecycle on the web

RW'11 Proceedings of the 7th international conference on Reasoning web: semantic technologies for the web of data
SCMS: semantifying content management systems

ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part II
Sentimatrix: multilingual sentiment analysis service

WASSA '11 Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis
Bootstrapped named entity recognition for product attribute extraction

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Using semantic roles to improve summaries

ENLG '11 Proceedings of the 13th European Workshop on Natural Language Generation
Introduction to linked data and its lifecycle on the web

RW'13 Proceedings of the 9th international conference on Reasoning Web: semantic technologies for intelligent data access
Crime profiling for the Arabic language using computational linguistic techniques

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Named Entity Recognition (NER) aims to extract and to classify rigid designators in text such as proper names, biological species, and temporal expressions. There has been growing interest in this field of research since the early 1990s. In this thesis, we document a trend moving away from handcrafted rules, and towards machine learning approaches. Still, recent machine learning approaches have a problem with annotated data availability, which is a serious shortcoming in building and maintaining large-scale NER systems. In this thesis, we present an NER system built with very little supervision. Human supervision is indeed limited to listing a few examples of each named entity (NE) type. First, we introduce a proof-of-concept semi-supervised system that can recognize four NE types. Then, we expand its capacities by improving key technologies, and we apply the system to an entire hierarchy comprised of 100 NE types. Our work makes the following contributions: the creation of a proof-of-concept semi-supervised NER system; the demonstration of an innovative noise filtering technique for generating NE lists; the validation of a strategy for learning disambiguation rules using automatically identified, unambiguous NEs; and finally, the development of an acronym detection algorithm, thus solving a rare but very difficult problem in alias resolution. We believe semi-supervised learning techniques are about to break new ground in the machine learning community. In this thesis, we show that limited supervision can build complete NER systems. On standard evaluation corpora, we report performances that compare to baseline supervised systems in the task of annotating NEs in texts.