Chinese lexical analysis using hierarchical hidden Markov model

Authors:
Hua-Ping Zhang;Qun Liu;Xue-Qi Cheng;Hao Zhang;Hong-Kui Yu
Affiliations:
The Chinese Academy of Science, Beijing, China;The Chinese Academy of Science, Beijing, China;The Chinese Academy of Science, Beijing, China;The Chinese Academy of Science, Beijing, China;The Chinese Academy of Science, Beijing, China
Venue:
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Year:
2003

Citing 10
Cited 14

A new statistical formula for Chinese text segmentation incorporating contextual information

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
The Hierarchical Hidden Markov Model: Analysis and Applications

Machine Learning
A compression-based algorithm for Chinese word segmentation

Computational Linguistics
A trainable rule-based algorithm for word segmentation

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Chinese named entity identification using class-based language model

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Covering ambiguity resolution in Chinese word segmentation based on contextual information

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
An agent-based approach to Chinese named entity recognition

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Learning case-based knowledge for disambiguating Chinese word segmentation: a preliminary study

SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18
Combining classifiers for Chinese word segmentation

SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18
Automatic recognition of Chinese unknown words based on roles tagging

SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18

Chinese segmentation and new word detection using conditional random fields

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
An empirical study of sentiment analysis for chinese documents

Expert Systems with Applications: An International Journal
Adapting Naive Bayes to Domain Adaptation for Sentiment Analysis

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Chunk-level reordering of source language sentences with automatically learned rules for statistical machine translation

SSST '07 Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation
Large quantity of text classification based on the improved feature-line method

PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
A Unified Character-Based Tagging Framework for Chinese Word Segmentation

ACM Transactions on Asian Language Information Processing (TALIP)
Detecting word misuse in Chinese

WSA '10 Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics in a World of Social Media
Construction and evaluation of Chinese emotion classification model

ICCOMP'06 Proceedings of the 10th WSEAS international conference on Computers
Automatic evaluation of Chinese translation output: word-level or character-level?

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Chinese categorization and novelty mining

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Learning outliers to refine a corpus for chinese webpage categorization

ICNC'05 Proceedings of the First international conference on Advances in Natural Computation - Volume Part I
Revising word lattice using support vector machine for Chinese word segmentation

Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
Free-gram phrase identification for modeling Chinese text

Information Processing Letters
The application of kalman filter based human-computer learning model to chinese word segmentation

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a unified approach for Chinese lexical analysis using hierarchical hidden Markov model (HHMM), which aims to incorporate Chinese word segmentation, Part-Of-Speech tagging, disambiguation and unknown words recognition into a whole theoretical frame. A class-based HMM is applied in word segmentation, and in this level unknown words are treated in the same way as common words listed in the lexicon. Unknown words are recognized with reliability in role-based HMM. As for disambiguation, the authors bring forth an n-shortest-path strategy that, in the early stage, reserves top N segmentation results as candidates and covers more ambiguity. Various experiments show that each level in HHMM contributes to lexical analysis. An HHMM-based system ICTCLAS was accomplished. The recent official evaluation indicates that ICTCLAS is one of the best Chinese lexical analyzers. In a word, HHMM is effective to Chinese lexical analysis.