Automatic Word Spacing Using Probabilistic Models Based on Character n-grams

Authors:
Do-Gil Lee;Hae-Chang Rim;Dongsuk Yook
Affiliations:
Korea University;Korea University;Korea University
Venue:
IEEE Intelligent Systems
Year:
2007

Citing 4
Cited 3

Tagging English text with a probabilistic model

Computational Linguistics
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Automatic Word Segmentation for Chinese Classics of Tea Based on Tree-Pruning

KAM '09 Proceedings of the 2009 Second International Symposium on Knowledge Acquisition and Modeling - Volume 01
Equations for part-of-speech tagging

AAAI'93 Proceedings of the eleventh national conference on Artificial intelligence

A novel word segmentation approach for written languages with word boundary markers

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Automatic word spacing of erroneous sentences in mobile devices with limited hardware resources

Information Processing and Management: an International Journal
Automatic Korean word spacing using Pegasos algorithm

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic word spacing decides the correct boundaries between words in a sentence. Word spacing is important in Korean, and word spacing errors are frequent. Several proposed probabilistic word-spacing models resolve problems with previous statistical approaches. These models regard automatic word spacing as a classification problem similar to part-of-speech tagging. By generalizing hidden Markov models, the models can consider a broader context and estimate more accurate probabilities. The authors tested these models under a wide range of conditions to compare them with the state of the art and performed detailed error analysis of them.