Chinese named entity recognition using lexicalized HMMs

Authors:
Guohong Fu;Kang-Kwong Luke
Affiliations:
The University of Hong Kong, Hong Kong;The University of Hong Kong, Hong Kong
Venue:
ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Year:
2005

Citing 17
Cited 8

Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
A maximum entropy approach to named entity recognition

A maximum entropy approach to named entity recognition
Shallow parsing using specialized hmms

The Journal of Machine Learning Research
Lexicalized hidden Markov models for part-of-speech tagging

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Improving part-of-speech tagging using lexicalized HMMs

Natural Language Engineering
Exploiting dictionaries in named entity extraction: combining semi-Markov extraction processes and data integration methods

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Chinese named entity identification using class-based language model

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Efficient support vector classifiers for named entity recognition

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Named entity recognition using an HMM-based chunk tagger

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Revision learning and its application to part-of-speech tagging

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Named entity recognition with character-level models

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
HowtogetaChineseName(Entity): segmentation and combination issues

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Chinese Named Entity Recognition combining a statistical model with human knowledge

MultiNER '03 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15
Chinese named entity recognition based on multilevel linguistic features

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Using a smoothing maximum entropy model for chinese nominal entity tagging

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Chinese unknown word identification using class-based LM

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing

Chinese word segmentation as morpheme-based lexical chunking

Information Sciences: an International Journal
A novel lexicalized HMM-based learning framework for web opinion miningNOTE FROM ACM: A Joint ACM Conference Committee has been determined that the authors of this article violated ACM's publication policy on simultaneous submissions. Therefore ACM has shut off access to this paper.

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
OpinionMiner: a novel machine learning system for web opinion mining and extraction

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic rule learning exploiting morphological features for named entity recognition in Turkish

Journal of Information Science
A unified framework for text analysis in chinese TTS

ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
A preliminary work on symptom name recognition from free-text clinical records of traditional chinese medicine using conditional random fields and reasonable features

BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
Complex Terminology Extraction Model from Unstructured Web Text Based Linguistic and Statistical Knowledge

International Journal of Information Retrieval Research
Supervised methods for symptom name recognition in free-text clinical records of traditional Chinese medicine: An empirical study

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a lexicalized HMM-based approach to Chinese named entity recognition (NER). To tackle the problem of unknown words, we unify unknown word identification and NER as a single tagging task on a sequence of known words. To do this, we first employ a known-word bigram-based model to segment a sentence into a sequence of known words, and then apply the uniformly lexicalized HMMs to assign each known word a proper hybrid tag that indicates its pattern in forming an entity and the category of the formed entity. Our system is able to integrate both the internal formation patterns and the surrounding contextual clues for NER under the framework of HMMs. As a result, the performance of the system can be improved without losing its efficiency in training and tagging. We have tested our system using different public corpora. The results show that lexicalized HMMs can substantially improve NER performance over standard HMMs. The results also indicate that character-based tagging (viz. the tagging based on pure single-character words) is comparable to and can even outperform the relevant known-word based tagging when a lexicalization technique is applied.