Application of Multi-Level Classifiers and Clustering for Automatic Word Spotting in Historical Document Images

Authors:
Reza Farrahi Moghaddam;Mohamed Cheriet
Affiliations:
-;-
Venue:
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Year:
2009

Citing 0
Cited 8

A multi-scale framework for adaptive binarization of degraded document images

Pattern Recognition
IBN SINA: a database for research on processing and understanding of Arabic manuscripts images

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
A spatially adaptive statistical method for the binarization of historical manuscripts and degraded document images

Pattern Recognition
TSV-LR: topological signature vector-based lexicon reduction for fast recognition of pre-modern Arabic subwords

Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
W-TSV: Weighted topological signature vector for lexicon reduction in handwritten Arabic documents

Pattern Recognition
A synthesised word approach to word retrieval in handwritten documents

Pattern Recognition
A learning framework for the optimization and automation of document binarization methods

Computer Vision and Image Understanding
Learning-based word spotting system for Arabic handwritten documents

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

A complete system for preprocessing and word spotting of very old historical document images is presented. Document images are processed for extraction of salient information using a word spotting technique which does not need line and word segmentation and is language independent.A multi-class library of connected components of document text is created based on six features. The spotting is performed using Euclidean distance measure enhanced by rotation and dynamic time wrapping transforms. The method is applied to a dataset from Juma Al Majid Center (Dubai)with promising results. A promising performance of the word spotting technique is obtained using an automatic preprocessing stage. In this stage, using content-level classifiers, accurate stroke pixels are extracted in a robust way. The preprocessed document images are also more legible to the end user and are less costly to archive and transfer.