Character code for Japanese text processing
Journal of Information Processing
IEEE Spectrum
Studies in part of speech labelling
HLT '91 Proceedings of the workshop on Speech and Natural Language
A probabilistic algorithm for segmenting non-Kanji Japanese strings
AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
A stochastic parts program and noun phrase parser for unrestricted text
ANLC '88 Proceedings of the second conference on Applied natural language processing
LINGSTAT: an interactive, machine-aided translation system
HLT '93 Proceedings of the workshop on Human Language Technology
Example-based correction of word segmentation and part of speech labelling
HLT '93 Proceedings of the workshop on Human Language Technology
Mostly-unsupervised statistical segmentation of Japanese Kanji sequences
Natural Language Engineering
Mostly-unsupervised statistical segmentation of Japanese: applications to kanji
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
An Information-Extraction System for Urdu---A Resource-Poor Language
ACM Transactions on Asian Language Information Processing (TALIP)
Hi-index | 0.00 |
The processing of Japanese text is complicated by the fact that there are no word delimiters. To segment Japanese text, systems typically use knowledge-based methods and large lexicons. This paper presents a novel approach to Japanese word segmentation which avoids the need for Japanese word lexicons and explicit rule bases. The algorithm utilizes a hidden Markov model, a stochastic process, to determine word boundaries. This method has achieved 91% accuracy in segmenting words in a test corpus.