Word Searching in Document Images Using Word Portion Matching

Authors:
Yue Lu;Chew Lim Tan
Affiliations:
-;-
Venue:
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Year:
2002

Citing 9
Cited 2

Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
The indexing and retrieval of document images: a survey

Computer Vision and Image Understanding - Special issue on document image understanding and retrieval
Imaged Document Text Retrieval Without OCR

IEEE Transactions on Pattern Analysis and Machine Intelligence
Keyword Spotting in Poorly Printed Documents using Pseudo 2-D Hidden Markov Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Retrieval methods for English-text with missrecognized OCR characters

ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Indexing of handwritten document images

DIA '97 Proceedings of the 1997 Workshop on Document Image Analysis
Word Spotting: A New Approach to Indexing Handwriting

CVPR '96 Proceedings of the 1996 Conference on Computer Vision and Pattern Recognition (CVPR '96)
An Approach to Word Image Matching Based on Weighted Hausforff Distance

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Model-Based Information Extraction Method Tolerant of OCR Errors for Document Images

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition

A survey of keyword spotting techniques for printed document images

Artificial Intelligence Review
Keyword spotting on korean document images by matching the keyword image

ICADL'05 Proceedings of the 8th international conference on Asian Digital Libraries: implementing strategies and sharing experiences

Quantified Score

Hi-index	0.00

Visualization

Abstract

An approach with the capability of searching a word portion in document images is proposed in this paper, to facilitate the detection and location of the user-specified query words. A feature string is synthesized according to the character sequence in the user-specified word, and each word image extracted from documents are represented by a feature string. Then, an inexact string matching technology is utilized to measure the similarity between the two feature strings, based on which we can estimate how the document word image is relevant to the user-specified word and decide whether its portion is the same as the user-specified word. Experimental results on real document images show that it is a promising approach, which is capable of detecting and locating the document words that entirely match or partially match with the user-specified word.