An optimised minimal edit distance for hand-written word recognition
Pattern Recognition Letters
The Role of Holistic Paradigms in Handwritten Word Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Shape recognition using attributed string matching with polygon vertices as the primitives
Pattern Recognition Letters
Difficult and Urgent Open Problems in Document Image Analysis for Libraries
DIAL '04 Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04)
Information Retrieval in Document Image Databases
IEEE Transactions on Knowledge and Data Engineering
The lifecycle of a digital historical document: structure and content
Proceedings of the 2004 ACM symposium on Document engineering
Font Adaptive Word Indexing of Modern Printed Documents
IEEE Transactions on Pattern Analysis and Machine Intelligence
Word matching using single closed contours for indexing handwritten historical documents
International Journal on Document Analysis and Recognition
Word spotting for historical documents
International Journal on Document Analysis and Recognition
Keyword-guided word spotting in historical printed documents using synthetic data and user feedback
International Journal on Document Analysis and Recognition
Text search for medieval manuscript images
Pattern Recognition
Matching word images for content-based retrieval from printed document images
International Journal on Document Analysis and Recognition
Document Image Retrieval through Word Shape Coding
IEEE Transactions on Pattern Analysis and Machine Intelligence
Word and Symbol Spotting Using Spatial Organization of Local Descriptors
DAS '08 Proceedings of the 2008 The Eighth IAPR International Workshop on Document Analysis Systems
A Complete Optical Character Recognition Methodology for Historical Documents
DAS '08 Proceedings of the 2008 The Eighth IAPR International Workshop on Document Analysis Systems
DAS '08 Proceedings of the 2008 The Eighth IAPR International Workshop on Document Analysis Systems
Language Independent Word Spotting in Scanned Documents
ICADL 08 Proceedings of the 11th International Conference on Asian Digital Libraries: Universal and Ubiquitous Access to Information
A Novel Approach for Word Spotting Using Merge-Split Edit Distance
CAIP '09 Proceedings of the 13th International Conference on Computer Analysis of Images and Patterns
Slit Style HOG Feature for Document Image Word Spotting
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Segmentation-free Word Spotting in Historical Printed Documents
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Keyword Spotting in Document Images through Word Shape Coding
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Handwritten Word Image Retrieval with Synthesized Typed Queries
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Word Image Matching Based on Hausdorff Distances
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
IBM Journal of Research and Development
Graph edit distance with node splitting and merging, and its application to diatom identification
GbRPR'03 Proceedings of the 4th IAPR international conference on Graph based representations in pattern recognition
A probabilistic interpretation of precision, recall and F-score, with implication for evaluation
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Efficient word retrieval by means of SOM clustering and PCA
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Keyword spotting in unconstrained handwritten Chinese documents using contextual word model
Image and Vision Computing
Hi-index | 0.01 |
Information spotting in scanned historical document images is a very challenging task. The joint use of the mechanical press and of human controlled inking introduced great variability in ink level within a book or even within a page. Consequently characters are often broken or merged together and thus become difficult to segment and recognize. The limitations of commercial OCR engines for information retrieval in historical document images have inspired alternative means of identification of given words in such documents. We present a word spotting method for scanned documents in order to find the word images that are similar to a query word, without assuming a correct segmentation of the words into characters. The connected components are first processed to transform a word pattern into a sequence of sub-patterns. Each sub-pattern is represented by a sequence of feature vectors. A modified Edit distance is proposed to perform a segmentation-driven string matching and to compute the Segmentation Driven Edit (SDE) distance between the words to be compared. The set of SDE operations is defined to obtain the word segmentations that are the most appropriate to evaluate their similarity. These operations are efficient to cope with broken and touching characters in words. The distortion of character shapes is handled by coupling the string matching process with local shape comparisons that are achieved by Dynamic Time Warping (DTW). The costs of the SDE operations are provided by the DTW distances. A sub-optimal version of the SDE string matching is also proposed to reduce the computation time, nevertheless it did not lead to a great decrease in performance. It is possible to enter a query by example or a textual query entered with the keyboard. Textual queries can be used to directly spot the word without the need to synthesize its image, as far as character prototype images are available. Results are presented for different documents and compared with other methods, showing the efficiency of our method.