Machine vision for keyword spotting using pseudo 2D hidden Markov models

Authors:
Shyh-shiaw Kuo;Oscar E. Agazzi
Affiliations:
AT&T Bell Laboratories, Middletown, NJ and Signal Processing Research Department, AT&T Bell Laboratories, Murray Hill, NJ;Signal Processing Research Department, AT&T Bell Laboratories, Murray Hill, NJ
Venue:
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: image and multidimensional signal processing - Volume V
Year:
1993

Citing 2
Cited 1

Discriminative template training for dynamic programming speech recognition

ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
Connected and degraded text recognition using planar hidden Markov models

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: image and multidimensional signal processing - Volume V

Modeling and recognition of cursive words with hidden Markov models

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

An algorithm for robust machine recognition of keywords embedded in a poorly printed document is presented. For each keyword, two statistical models, named pseudo 2D Hidden Markov Models (P2DHMMs), are created for representing the actual keyword and all the other extraneous words, respectively. Dynamic programming is then used for matching an unknown input word with the two models and making a maximum likelihood decision. Although the models are pseudo 2D in the sense that they are not fully connected two dimensional networks, they are shown to be general enough in characterizing printed words efficiently. These models facilitates a nice "elastic matching" property in both horizontal and vertical directions, which makes the recognizer to be not only independent of size and slant but also tolerant of highly deformed and noisy words. The system is evaluated on a synthetically created database which contains about 26,000 words. Currently, we achieve the recognition accuracy of 99% when words in testing and training sets are in the same font size, and 96% when they are in different sizes. In the latter case, the conventional 1D HMM approach [1] achieves only 70% accuracy rate.