Contextual postprocessing of a Korean OCR system by linguistic constraints

Authors:
Hyuk-Chul Kwon;Ho-Jeong Hwang;Min-Jung Kim;Seong-Whan Lee
Affiliations:
-;-;-;-
Venue:
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 2) - Volume 2
Year:
1995

Citing 0
Cited 1

A post-processing scheme for malayalam using statistical sub-character language models

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The approach in this paper focuses on the contextual postprocessing by selecting the most feasible word from multiple output strings of an OCR system. The correction is applied only when the selection fails. The selected word is confirmed by the collocation between the word and the adjacent words. The five functions applied in the system are (1) to select a word from candidate words, (2) to correct candidate words using a confusion matrix of syllables, (3) to combine two substrings to a word that spans two lines, (4) to guess unknown nouns, and (5) to confirm a selected word by the contextual information of adjacent words. To improve speed, we use syllable di-grams and viable-prefixes of Korean words. The experimental result shows that the two heuristics speed up the system more than 1,000 times in worst case. Our system improves the word recognition rate of the OCR system from 90.50% to 94.72%.