Self-organized language modeling for speech recognition
Readings in speech recognition
A computational model for recognition of multifont word images
Machine Vision and Applications - Special issue: document image analysis techniques
A Methodology for Mapping Scores to Probabilities
IEEE Transactions on Pattern Analysis and Machine Intelligence
Postprocessing of Recognized Strings Using Nonstationary Markovian Models
IEEE Transactions on Pattern Analysis and Machine Intelligence
Twenty Years of Document Image Analysis in PAMI
IEEE Transactions on Pattern Analysis and Machine Intelligence
Lexical Search Approach for Character-String Recognition
DAS '98 Selected Papers from the Third IAPR Workshop on Document Analysis Systems: Theory and Practice
A Methodology for Deriving Probabilistic Correctness Measures from Recognizers
CVPR '98 Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Text Identification in Noisy Document Images Using Markov Random Field
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Machine Printed Text and Handwriting Identification in Noisy Document Images
IEEE Transactions on Pattern Analysis and Machine Intelligence
Offline Grammar-Based Recognition of Handwritten Sentences
IEEE Transactions on Pattern Analysis and Machine Intelligence
Multi-character field recognition for Arabic and Chinese handwriting
SACH'06 Proceedings of the 2006 conference on Arabic and Chinese handwriting recognition
Why multiple document image binarizations improve OCR
Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing
Hi-index | 0.14 |
The use of a statistical language model to improve the performance of an algorithm for recognizing digital images of handwritten or machine-printed text is discussed. A word recognition algorithm first determines a set of words (called a neighborhood) from a lexicon that are visually similar to each input word image. Syntactic classifications for the words and the transition probabilities between those classifications are input to the Viterbi algorithm. The Viterbi algorithm determines the sequence of syntactic classes (the states of an underlying Markov process) for each sentence that have the maximum a posteriori probability, given the observed neighborhoods. The performance of the word recognition algorithm is improved by removing words from neighborhoods with classes that are not included on the estimated state sequence.An experimental application is demonstrated with a neighborhood generation algorithm that produces a number of guesses about the identity of each word in a running text. The use of zero, first and second order transition probabilities and different levels of noise in estimating the neighborhood are explored.