Document Retrieval System Tolerant of Segmentation Errors of Document Images

Authors:
Takeshi Nagasaki;Toshikazu Takahashi;Katsumi Marukawa
Affiliations:
Hitachi, Ltd.;Hitachi, Ltd.;Hitachi, Ltd.
Venue:
IWFHR '04 Proceedings of the Ninth International Workshop on Frontiers in Handwriting Recognition
Year:
2004

Citing 0
Cited 1

Keyword spotting in unconstrained handwritten Chinese documents using contextual word model

Image and Vision Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a new document retrieval method that is tolerant of OCR segmentation errors in document images. To overcome the segmentation and recognition errors that most OCR-based retrieval systems suffer from, the proposed method consists of two processing phases. First, the OCR engine first generates multiple character-segmentation and recognition hypotheses. Then the retrieval engine extracts keywords from the recognition hypotheses by using lexicon-driven dynamic programming (DP) matching. We have applied this method to both handwritten and printed document images and have demonstrated its effectiveness in reducing false drops and false alarms.