Japanese OCR error correction using character shape similarity and statistical language model

Authors:
Masaaki Nagata
Affiliations:
NTT Information and Communication Systems Laboratories, Kanagawa, Japan
Venue:
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Year:
1998

Citing 5
Cited 6

Context based spelling correction

Information Processing and Management: an International Journal
Techniques for automatically correcting words in text

ACM Computing Surveys (CSUR)
The String-to-String Correction Problem

Journal of the ACM (JACM)
Combining Trigram-based and feature-based methods for context-sensitive spelling correction

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Context-based spelling correction for Japanese OCR

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2

Context-sensitive detection and correction of homonym errors in Japanese texts (poster session)

IRAL '00 Proceedings of the fifth international workshop on on Information retrieval with Asian languages
Combining trigram and automatic weight distribution in Chinese spelling error correction

Journal of Computer Science and Technology
Synchronous morphological analysis of grapheme and phoneme for Japanese OCR

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Portable translator capable of recognizing characters on signboard and menu captured by built-in camera

ACLdemo '05 Proceedings of the ACL 2005 on Interactive poster and demonstration sessions
An unsupervised and data-driven approach for spell checking in Vietnamese OCR-scanned texts

HYBRID '12 Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data
A study on document retrieval system based on visualization to manage OCR documents

HCI'13 Proceedings of the 15th international conference on Human-Computer Interaction: interaction modalities and techniques - Volume Part IV

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a novel OCR error correction method for languages without word delimiters that have a large character set, such as Japanese and Chinese. It consists of a statistical OCR model, an approximate word matching method using character shape similarity, and a word segmentation algorithm using a statistical language model. By using a statistical OCR model and character shape similarity, the proposed error corrector outperforms the previously published method. When the baseline character recognition accuracy is 90%, it achieves 97.4% character recognition accuracy.