Teaching a weaker classifier: named entity recognition on upper case text

Authors:
Hai Leong Chieu;Hwee Tou Ng
Affiliations:
DSO National Laboratories, Singapore;National University of Singapore, Singapore
Venue:
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Year:
2002

Citing 7
Cited 3

Inducing Features of Random Fields

IEEE Transactions on Pattern Analysis and Machine Intelligence
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
Analyzing the effectiveness and applicability of co-training

Proceedings of the ninth international conference on Information and knowledge management
Enhancing Supervised Learning with Unlabeled Data

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A maximum entropy approach to named entity recognition

A maximum entropy approach to named entity recognition
Named entity recognition: a maximum entropy approach using global information

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1

A Case Restoration Approach to Named Entity Tagging in Degraded Documents

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Named entity recognition using a character-based probabilistic approach

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Question Answering on a case insensitive corpus

MultiSumQA '03 Proceedings of the ACL 2003 workshop on Multilingual summarization and question answering - Volume 12

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes how a machine-learning named entity recognizer (NER) on upper case text can be improved by using a mixed case NER and some unlabeled text. The mixed case NER can be used to tag some unlabeled mixed case text, which are then used as additional training material for the upper case NER. We show that this approach reduces the performance gap between the mixed case NER and the upper case NER substantially, by 39% for MUC-6 and 22% for MUC-7 named entity test data. Our method is thus useful in improving the accuracy of NERs on upper case text, such as transcribed text from automatic speech recognizers where case information is missing.