Automatic word spacing of erroneous sentences in mobile devices with limited hardware resources

Authors:
Yeongkil Song;Harksoo Kim
Affiliations:
Program of Computer and Communications Engineering, College of Information Technology, Kangwon National University, 192-1, Hyoja 2(i)-dong, Chuncheon-si, Gangwon-do 200-701, Republic of Korea;Program of Computer and Communications Engineering, College of Information Technology, Kangwon National University, 192-1, Hyoja 2(i)-dong, Chuncheon-si, Gangwon-do 200-701, Republic of Korea
Venue:
Information Processing and Management: an International Journal
Year:
2011

Citing 12
Cited 0

Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Learning dictionaries for information extraction by multi-level bootstrapping

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Statistical Language Learning

Statistical Language Learning
Naive Bayesian Classifier Committees

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A maximum entropy approach to identifying sentence boundaries

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Automatic word spacing in Korean for small memory devices

IEA/AIE'2005 Proceedings of the 18th international conference on Innovations in Applied Artificial Intelligence
Automatic word spacing using hidden Markov model for refining Korean text corpora

COLING '02 Proceedings of the 3rd workshop on Asian language resources and international standardization - Volume 12
Self-organizing η-gram model for automatic word spacing

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Automatic Word Spacing Using Probabilistic Models Based on Character n-grams

IEEE Intelligent Systems
Hidden Markov processes

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the rapid evolution of the mobile environment, the demand for natural language applications on mobile devices is increasing. This paper proposes an automatic word spacing system, the first step module of natural language processing (NLP) for many languages with their own word spacing rules, that is designed for mobile devices with limited hardware resources. The proposed system uses two stages. In the first stage, it preliminarily corrects word spacing errors by using a modified hidden Markov model based on character unigrams. In the second stage, the proposed system re-corrects the miscorrected word spaces by using lexical rules based on character bigrams or longer combinations. By using this hybrid method, the proposed system improves the robustness against unknown word patterns, reduces memory usage, and increases accuracy. To evaluate the proposed system in a realistic mobile environment, we constructed a mobile-style colloquial corpus using a simple simulation method. In experiments with a commercial mobile phone, the proposed system showed good performances (a response time of 0.20s per sentence, a memory usage of 2.04MB, and an accuracy of 92-95%) in the various evaluation measures.