A comparison of discrete and continuous hidden Markov models for phrase spotting in text images

Authors:
F. R. Chen;L. D. Wilcox;D. S. Bloomberg
Affiliations:
-;-;-
Venue:
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1
Year:
1995

Citing 0
Cited 4

Devising Interactive Access Techniques for Indian Language Document Images

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Feature string-based intelligent information retrieval from Tamil document images

International Journal of Computer Applications in Technology
A survey of keyword spotting techniques for printed document images

Artificial Intelligence Review
Keyword spotting on korean document images by matching the keyword image

ICADL'05 Proceedings of the 8th international conference on Asian Digital Libraries: implementing strategies and sharing experiences

Quantified Score

Hi-index	0.00

Visualization

Abstract

In spotting for phrases in text images, speed and accuracy are important considerations. In a hidden Markov model (HMM) based spotter recognition time is dominated by the time required to compute the state conditional observation probabilities. These probabilities are a measure of how well the data match each state in the model. In this paper discrete and continuous hidden Markov models are compared based on speed and accuracy in spotting for phrases in text images. For the discrete HMM, vector quantization is used to associate each continuous feature vector with a discrete value. For the continuous HMMs, the observation distributions for the feature vectors are modeled by either a single Gaussian, or a mixture of two Gaussians. Comparisons were made on a subset of the UW English Document Image Database I. The best accuracy was observed when a mixture of two Gaussians was used in the continuous HMM. The discrete HMM provides for faster spotting particularly when long phrases are used.