Class-based n-gram models of natural language
Computational Linguistics
A practical part-of-speech tagger
ANLC '92 Proceedings of the third conference on Applied natural language processing
A stochastic Japanese morphological analyzer using a forward-DP backward-A* N-best search algorithm
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Detection of language (model) errors
EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
VOGUE: A variable order hidden Markov model with duration based on frequent sequence mining
ACM Transactions on Knowledge Discovery from Data (TKDD)
Hi-index | 0.00 |
Ergodic HMMs have been successfully used for modeling sentence production. However for some oriental languages such as Chinese, a word can consist of multiple characters without word boundary markers between adjacent words in a sentence. This makes word-segmentation on the training and testing data necessary before ergodic HMM can be applied as the language model. This paper introduces the N-th order Ergodic Multigram HMM for language modeling of such languages. Each state of the HMM can generate a variable number of characters corresponding to one word. The model can be trained without word-segmented and tagged corpus, and both segmentation and tagging are trained in one single model. Results on its application on a Chinese corpus are reported.