Context-based spelling correction for Japanese OCR

  • Authors:
  • Masaaki Nagata

  • Affiliations:
  • NTT Information and Communication Systems Laboratories, Kanagawa, Japan

  • Venue:
  • COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
  • Year:
  • 1996

Quantified Score

Hi-index 0.01

Visualization

Abstract

We present a novel spelling correction method for those languages that have no delimiter between words, such as Japanese, Chinese, and Thai. It consists of an approximate word matching method and an N-best word segmentation algorithm using a statistical language model. For OCR errors, the proposed word-based correction method outperforms the conventional character-based correction method. When the baseline character recognition accuracy is 90%, it achieves 96.0% character recognition accuracy and 96.3% word segmentation accuracy, while the character recognition accuracy of character-based correction is 93.3%.