A new statistical formula for Chinese text segmentation incorporating contextual information
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
The Hierarchical Hidden Markov Model: Analysis and Applications
Machine Learning
A compression-based algorithm for Chinese word segmentation
Computational Linguistics
A trainable rule-based algorithm for word segmentation
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Chinese named entity identification using class-based language model
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Covering ambiguity resolution in Chinese word segmentation based on contextual information
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
An agent-based approach to Chinese named entity recognition
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Learning case-based knowledge for disambiguating Chinese word segmentation: a preliminary study
SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18
Combining classifiers for Chinese word segmentation
SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18
Automatic recognition of Chinese unknown words based on roles tagging
SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18
Chinese segmentation and new word detection using conditional random fields
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
An empirical study of sentiment analysis for chinese documents
Expert Systems with Applications: An International Journal
Adapting Naive Bayes to Domain Adaptation for Sentiment Analysis
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
SSST '07 Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation
Large quantity of text classification based on the improved feature-line method
PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
A Unified Character-Based Tagging Framework for Chinese Word Segmentation
ACM Transactions on Asian Language Information Processing (TALIP)
Detecting word misuse in Chinese
WSA '10 Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics in a World of Social Media
Construction and evaluation of Chinese emotion classification model
ICCOMP'06 Proceedings of the 10th WSEAS international conference on Computers
Automatic evaluation of Chinese translation output: word-level or character-level?
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Chinese categorization and novelty mining
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Learning outliers to refine a corpus for chinese webpage categorization
ICNC'05 Proceedings of the First international conference on Advances in Natural Computation - Volume Part I
Revising word lattice using support vector machine for Chinese word segmentation
Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
Free-gram phrase identification for modeling Chinese text
Information Processing Letters
The application of kalman filter based human-computer learning model to chinese word segmentation
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Hi-index | 0.00 |
This paper presents a unified approach for Chinese lexical analysis using hierarchical hidden Markov model (HHMM), which aims to incorporate Chinese word segmentation, Part-Of-Speech tagging, disambiguation and unknown words recognition into a whole theoretical frame. A class-based HMM is applied in word segmentation, and in this level unknown words are treated in the same way as common words listed in the lexicon. Unknown words are recognized with reliability in role-based HMM. As for disambiguation, the authors bring forth an n-shortest-path strategy that, in the early stage, reserves top N segmentation results as candidates and covers more ambiguity. Various experiments show that each level in HHMM contributes to lexical analysis. An HHMM-based system ICTCLAS was accomplished. The recent official evaluation indicates that ICTCLAS is one of the best Chinese lexical analyzers. In a word, HHMM is effective to Chinese lexical analysis.