Empirical study on the performance stability of named entity recognition model across domains

Authors:
Hong Lei Guo;Li Zhang;Zhong Su
Affiliations:
IBM China Research Laboratory, Haidian District, Beijing, P.R.C.;IBM China Research Laboratory, Haidian District, Beijing, P.R.C.;IBM China Research Laboratory, Haidian District, Beijing, P.R.C.
Venue:
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Year:
2006

Citing 15
Cited 4

An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
Employing EM and Pool-Based Active Learning for Text Classification

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A maximum entropy approach to named entity recognition

A maximum entropy approach to named entity recognition
Text chunking based on a generalization of winnow

The Journal of Machine Learning Research
Efficient support vector classifiers for named entity recognition

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach

Computational Linguistics
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
A simple named entity extractor using AdaBoost

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Named entity recognition through classifier combination

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Named entity recognition with character-level models

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Memory-based named entity recognition using unannotated data

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
HowtogetaChineseName(Entity): segmentation and combination issues

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Multi-criteria-based active learning for named entity recognition

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Chinese named entity recognition based on multiple features

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Chinese named entity recognition based on multilevel linguistic features

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing

Address standardization with latent semantic association

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Regular expression learning for information extraction

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Domain adaptation with latent semantic association for named entity recognition

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
EagleEye: entity-centric business intelligence for smarter decisions

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

When a machine learning-based named entity recognition system is employed in a new domain, its performance usually degrades. In this paper, we provide an empirical study on the impact of training data size and domain information on the performance stability of named entity recognition models. We present an informative sample selection method for building high quality and stable named entity recognition models across domains. Experimental results show that the performance of the named entity recognition model is enhanced significantly after being trained with these informative samples.