A Study on Utilizing OCR Technology in Building Text Database

Authors:
Sun-Hwa Hahn;Joon Ho Lee;Jin-Hyung Kim
Affiliations:
-;-;-
Venue:
DEXA '99 Proceedings of the 10th International Workshop on Database & Expert Systems Applications
Year:
1999

Citing 0
Cited 2

Efficient mobile phone Chinese optical character recognition systems by use of heuristic fuzzy rules and bigram Markov language models

Applied Soft Computing
An interactive procedure to preserve the desired edges during the image processing of noise reduction

EURASIP Journal on Advances in Signal Processing - Special issue on advanced image processing for defense and security applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Optical character recognition (OCR) might be the most plausible method in building database out of printed documents. This paper describes the points to be considered when one selects an OCR system in order to build database. Based on our experiments on four commercial OCR systems, we chose one that shows the highest recognition rate to build OCR-text database. The character recognition rate marks 90.5 % over 970 abstracts of conference proceedings in Korean. This recognition rate is still insufficient for practical use.For practical use of the OCR texts which has approximately 10 % of character-level errors, we need to investigate if an automatic indexing generates acceptable retrieval performance. In addition, it is necessary to evaluate which indexing method results in better performance. Experimental results show that 2-gram indexing provides similar retrieval efficiency to morpheme-based indexing for the Korean OCR text database. In addition, the retrieved results of the indexed OCR texts are similar to those selected by experts.