HowtogetaChineseName(Entity): segmentation and combination issues

Authors:
Hongyan Jing;Radu Florian;Xiaoqiang Luo;Tong Zhang;Abraham Ittycheriah
Affiliations:
IBM T.J. Watson Research Center, Yorktown Heights, NY;IBM T.J. Watson Research Center, Yorktown Heights, NY;IBM T.J. Watson Research Center, Yorktown Heights, NY;IBM T.J. Watson Research Center, Yorktown Heights, NY;IBM T.J. Watson Research Center, Yorktown Heights, NY
Venue:
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Year:
2003

Citing 14
Cited 14

Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Learning to Parse Natural Language with Maximum Entropy Models

Machine Learning - Special issue on natural language learning
An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
Text chunking based on a generalization of winnow

The Journal of Machine Learning Research
Improving accuracy in word class tagging through the combination of machine learning systems

Computational Linguistics
Combining Classifiers for word sense disambiguation

Natural Language Engineering
Classifier combination for improved lexical disambiguation

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Applying system combination to base noun phrase identification

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Japanese named entity extraction evaluation: analysis of results

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Chinese named entity identification using class-based language model

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Combining outputs of multiple Japanese named entity chunkers by stacking

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Introduction to the CoNLL-2002 shared task: language-independent named entity recognition

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Named entity recognition through classifier combination

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4

Chinese named entity recognition using lexicalized HMMs

ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Factorizing complex models: a case study in mention detection

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Detecting, categorizing and clustering entity mentions in Chinese text

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Address standardization with latent semantic association

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Empirical study on the performance stability of named entity recognition model across domains

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Mention detection crossing the language barrier

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Using N-best lists for named entity recognition from Chinese speech

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Language specific issue and feature exploration in Chinese event extraction

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Morphology-Based Segmentation Combination for Arabic Mention Detection

ACM Transactions on Asian Language Information Processing (TALIP)
Cross-Language Information Propagation for Arabic Mention Detection

ACM Transactions on Asian Language Information Processing (TALIP)
Arabic Mention Detection: toward better unit of analysis

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Using deep belief nets for Chinese named entity categorization

NEWS '10 Proceedings of the 2010 Named Entities Workshop
EagleEye: entity-centric business intelligence for smarter decisions

IBM Journal of Research and Development
Chinese named entity recognition based on multilevel linguistic features

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

When building a Chinese named entity recognition system, one must deal with certain language-specific issues such as whether the model should be based on characters or words. While there is no unique answer to this question, we discuss in detail advantages and disadvantages of each model, identify problems in segmentation and suggest possible solutions, presenting our observations, analysis, and experimental results. The second topic of this paper is classifier combination. We present and describe four classifiers for Chinese named entity recognition and describe various methods for combining their outputs. The results demonstrate that classifier combination is an effective technique of improving system performance: experiments over a large annotated corpus of fine-grained entity types exhibit a 10% relative reduction in F-measure error.