Context-based spelling correction for Japanese OCR

Authors:
Masaaki Nagata
Affiliations:
NTT Information and Communication Systems Laboratories, Kanagawa, Japan
Venue:
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Year:
1996

Citing 8
Cited 9

Poor estimates of context are worse than none

HLT '90 Proceedings of the workshop on Speech and Natural Language
Context based spelling correction

Information Processing and Management: an International Journal
Techniques for automatically correcting words in text

ACM Computing Surveys (CSUR)
The String-to-String Correction Problem

Journal of the ACM (JACM)
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
A practical part-of-speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
A stochastic Japanese morphological analyzer using a forward-DP backward-A* N-best search algorithm

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
A spelling correction program based on a noisy channel model

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 2

Mostly-unsupervised statistical segmentation of Japanese Kanji sequences

Natural Language Engineering
Mostly-unsupervised statistical segmentation of Japanese: applications to kanji

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Combining trigram and Winnow in thai OCR error correction

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Japanese OCR error correction using character shape similarity and statistical language model

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
A part of speech estimation method for Japanese unknown words using a statistical model of morphology and context

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Synchronous morphological analysis of grapheme and phoneme for Japanese OCR

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Thai Word Segmentation with Hidden Markov Model and Decision Tree

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
A joint statistical model for simultaneous word spacing and spelling error correction for Korean

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
An unsupervised and data-driven approach for spell checking in Vietnamese OCR-scanned texts

HYBRID '12 Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data

Quantified Score

Hi-index	0.01

Visualization

Abstract

We present a novel spelling correction method for those languages that have no delimiter between words, such as Japanese, Chinese, and Thai. It consists of an approximate word matching method and an N-best word segmentation algorithm using a statistical language model. For OCR errors, the proposed word-based correction method outperforms the conventional character-based correction method. When the baseline character recognition accuracy is 90%, it achieves 96.0% character recognition accuracy and 96.3% word segmentation accuracy, while the character recognition accuracy of character-based correction is 93.3%.