A Segmentation-free Approach for Keyword Search in Historical Typewritten Documents

Authors:
B. Gatos;T. Konidaris;K. Ntzios;I. Pratikakis;S. J. Perantonis
Affiliations:
Institute of Informatics and Telecomunications, Athens, Greece;Institute of Informatics and Telecomunications, Athens, Greece;Institute of Informatics and Telecomunications, Athens, Greece;Institute of Informatics and Telecomunications, Athens, Greece;Institute of Informatics and Telecomunications, Athens, Greece
Venue:
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Year:
2005

Citing 7
Cited 6

Word spotting: indexing handwritten manuscripts

Intelligent multimedia information retrieval
A System for Interpretation of Line Drawings

IEEE Transactions on Pattern Analysis and Machine Intelligence
HMM Word Recognition Engine

ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
The Detection of Duplicates in Document Image Databases

ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
An Approach to Word Image Matching Based on Weighted Hausforff Distance

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Features for Word Spotting in Historical Manuscripts

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
A new algorithm for removing noisy borders from monochromatic documents

Proceedings of the 2004 ACM symposium on Applied computing

Text image matching without language model using a Hausdorff distance

Information Processing and Management: an International Journal
Text line detection in handwritten documents

Pattern Recognition
Robust image based document comparison using attributed relational graphs

SPPRA '08 Proceedings of the Fifth IASTED International Conference on Signal Processing, Pattern Recognition and Applications
User-assisted alignment of Arabic historical manuscripts

Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Automatic keyword extraction from historical document images

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Word spotting application in historical mongolian document images

ICIC'13 Proceedings of the 9th international conference on Intelligent Computing Theories

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a novel segmentation-free approach for keyword search in historical typewritten documents combining image preprocessing, synthetic data creation, word spotting and user's feedback technologies. Our aim is to search for keywords typed by the user in a large collection of digitized typewritten historical documents. The proposed method is based on: (i) image preprocessing for image binarization and enhancement, noisy border and frame removal, orientation and skew correction; (ii) creation of synthetic image words from keywords typed by the user; (iii) word segmentation using dynamic parameters; (iv) efficient feature extraction for each image word and (v) a retrieval procedure that is optimized by user's feedback. Experimental results prove the efficiency of the proposed approach.