Critical tokenization and its properties
Computational Linguistics
Splitting-merging model of Chinese word tokenization and segmentation
Natural Language Engineering
Syllable-based model for the Korean morphology
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Accessor variety criteria for Chinese word extraction
Computational Linguistics
Morphologically based automatic phonetic transcription
IBM Systems Journal
Hi-index | 0.00 |
A system for the automatic segmentation of German words into morphs was developed. The main linguistic knowledge sources used by the system are a word syntax and a morph dictionary. The syntax is written in the formalism of right linear regular grammars and comprises approximately 1, 400 rules describing the set of those sequences of morph classes which underlie syntactically well formed words. The morph dictionary contains almost 11, 000 morphs. Each morph is assigned to up to 6 morph classes. - Statistical evaluations with 6000 text words showed that more than 99% of the segmented words got a correct segmentation.