Word Spotting in Bitmapped Fax Documents

Authors:
William J. Williams;Eugene J. Zalubas;Alfred O. Hero, III
Affiliations:
Electrical Engineering and Computer Science Dept., University of Michigan, Ann Arbor MI 48109, USA;Electrical Engineering and Computer Science Dept., University of Michigan, Ann Arbor MI 48109, USA;Electrical Engineering and Computer Science Dept., University of Michigan, Ann Arbor MI 48109, USA
Venue:
Information Retrieval
Year:
2000

Citing 4
Cited 3

On the Recognition of Printed Characters of Any Font and Size

IEEE Transactions on Pattern Analysis and Machine Intelligence
Degraded gray-scale text recognition using pseudo-2D hidden Markov models and N-best hypotheses

Graphical Models and Image Processing
Scale and Translation Invariant Methods for Enhanced Time-FrequencyPattern Recognition

Multidimensional Systems and Signal Processing - Special issue on recent developments in time-frequency analysis
Keyword Spotting in Poorly Printed Documents using Pseudo 2-D Hidden Markov Models

IEEE Transactions on Pattern Analysis and Machine Intelligence

Font Adaptive Word Indexing of Modern Printed Documents

IEEE Transactions on Pattern Analysis and Machine Intelligence
Efficient word retrieval by means of SOM clustering and PCA

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Exploring digital libraries with document image retrieval

ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

Images and signals may be represented by forms invariant to time shifts, spatial shifts, frequency shifts, and scale changes. Advances in time-frequency analysis and scale transform techniques have made this possible. However, factors such as noise contamination and “style” differences complicate this. An example is found in text, where letters and words may vary in size and position. Examples of complicating variations include the font used, corruption during facsimile (fax) transmission, and printer characteristics. The solution advanced in this paper is to cast the desired invariants into separate subspaces for each extraneous factor or group of factors. The first goal is to have minimal overlap between these subspaces and the second goal is to be able to identify each subspace accurately. Concepts borrowed from high-resolution spectral analysis, but adapted uniquely to this problem have been found to be useful in this context. Once the pertinent subspace is identified, the recognition of a particular invariant form within this subspace is relatively simple using well-known singular value decomposition (SVD) techniques. The basic elements of the approach can be applied to a variety of pattern recognition problems. The specific application covered in this paper is word spotting in bitmapped fax documents.