Context based spelling correction
Information Processing and Management: an International Journal
Techniques for automatically correcting words in text
ACM Computing Surveys (CSUR)
The String-to-String Correction Problem
Journal of the ACM (JACM)
Combining Trigram-based and feature-based methods for context-sensitive spelling correction
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Context-based spelling correction for Japanese OCR
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Context-sensitive detection and correction of homonym errors in Japanese texts (poster session)
IRAL '00 Proceedings of the fifth international workshop on on Information retrieval with Asian languages
Combining trigram and automatic weight distribution in Chinese spelling error correction
Journal of Computer Science and Technology
Synchronous morphological analysis of grapheme and phoneme for Japanese OCR
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
ACLdemo '05 Proceedings of the ACL 2005 on Interactive poster and demonstration sessions
An unsupervised and data-driven approach for spell checking in Vietnamese OCR-scanned texts
HYBRID '12 Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data
A study on document retrieval system based on visualization to manage OCR documents
HCI'13 Proceedings of the 15th international conference on Human-Computer Interaction: interaction modalities and techniques - Volume Part IV
Hi-index | 0.00 |
We present a novel OCR error correction method for languages without word delimiters that have a large character set, such as Japanese and Chinese. It consists of a statistical OCR model, an approximate word matching method using character shape similarity, and a word segmentation algorithm using a statistical language model. By using a statistical OCR model and character shape similarity, the proposed error corrector outperforms the previously published method. When the baseline character recognition accuracy is 90%, it achieves 97.4% character recognition accuracy.