Poor estimates of context are worse than none
HLT '90 Proceedings of the workshop on Speech and Natural Language
Context based spelling correction
Information Processing and Management: an International Journal
Techniques for automatically correcting words in text
ACM Computing Surveys (CSUR)
The String-to-String Correction Problem
Journal of the ACM (JACM)
A stochastic parts program and noun phrase parser for unrestricted text
ANLC '88 Proceedings of the second conference on Applied natural language processing
A practical part-of-speech tagger
ANLC '92 Proceedings of the third conference on Applied natural language processing
A stochastic Japanese morphological analyzer using a forward-DP backward-A* N-best search algorithm
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
A spelling correction program based on a noisy channel model
COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 2
Mostly-unsupervised statistical segmentation of Japanese Kanji sequences
Natural Language Engineering
Mostly-unsupervised statistical segmentation of Japanese: applications to kanji
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Combining trigram and Winnow in thai OCR error correction
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Japanese OCR error correction using character shape similarity and statistical language model
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Synchronous morphological analysis of grapheme and phoneme for Japanese OCR
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Thai Word Segmentation with Hidden Markov Model and Decision Tree
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
A joint statistical model for simultaneous word spacing and spelling error correction for Korean
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
An unsupervised and data-driven approach for spell checking in Vietnamese OCR-scanned texts
HYBRID '12 Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data
Hi-index | 0.01 |
We present a novel spelling correction method for those languages that have no delimiter between words, such as Japanese, Chinese, and Thai. It consists of an approximate word matching method and an N-best word segmentation algorithm using a statistical language model. For OCR errors, the proposed word-based correction method outperforms the conventional character-based correction method. When the baseline character recognition accuracy is 90%, it achieves 96.0% character recognition accuracy and 96.3% word segmentation accuracy, while the character recognition accuracy of character-based correction is 93.3%.