Survey and bibliography of Arabic optical text recognition
Signal Processing
IEEE Transactions on Pattern Analysis and Machine Intelligence
Omnifont and Unlimited-Vocabulary OCR for English and Arabic
ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
An Experimental HMM-Based Postal OCR System
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 4 - Volume 4
Language-Independent OCR Using a Continuous Speech Recognition System
ICPR '96 Proceedings of the International Conference on Pattern Recognition (ICPR '96) Volume III-Volume 7276 - Volume 7276
Modelling polyfont printed characters with HMMs and a shift invariant Hamming distance
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1
Printed PAW Recognition Based on Planar Hidden Markov Models
ICPR '96 Proceedings of the 13th International Conference on Pattern Recognition - Volume 2
Modeling and recognition of cursive words with hidden Markov models
Pattern Recognition
Twenty Years of Document Image Analysis in PAMI
IEEE Transactions on Pattern Analysis and Machine Intelligence
Multilingual machine printed OCR
Hidden Markov models
Coarse-to-Fine Dynamic Programming
IEEE Transactions on Pattern Analysis and Machine Intelligence
IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic Completion of Korean Words for Open Vocabulary Pen Interface
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Offline Recognition of Syntax-Constrained Cursive Handwritten Text
Proceedings of the Joint IAPR International Workshops on Advances in Pattern Recognition
Style Context with Second-Order Statistics
IEEE Transactions on Pattern Analysis and Machine Intelligence
Style Consistent Classification of Isogenous Patterns
IEEE Transactions on Pattern Analysis and Machine Intelligence
Probabilistic Finite-State Machines-Part II
IEEE Transactions on Pattern Analysis and Machine Intelligence
Texture for Script Identification
IEEE Transactions on Pattern Analysis and Machine Intelligence
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
On Appearance-Based Feature Extraction Methods for Writer-Independent Handwritten Text Recognition
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Offline Arabic Handwriting Recognition: A Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence
Document zone content classification and its performance evaluation
Pattern Recognition
Rejection strategies for offline handwritten text line recognition
Pattern Recognition Letters
Rejection strategies for offline handwritten text line recognition
Pattern Recognition Letters
Offline recognition of omnifont Arabic text using the HMM ToolKit (HTK)
Pattern Recognition Letters
ACM Transactions on Asian Language Information Processing (TALIP)
A pictorial dictionary for printed Farsi subwords
Pattern Recognition Letters
Recognition of off-line printed Arabic text using Hidden Markov Models
Signal Processing
Holistic approach for classifying and retrieving personal Arabic handwritten documents
AIKED'08 Proceedings of the 7th WSEAS International Conference on Artificial intelligence, knowledge engineering and data bases
Computer Assisted Transcription of Text Images and Multimodal Interaction
MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
Classification of personal Arabic handwritten documents
WSEAS Transactions on Information Science and Applications
Expert Systems with Applications: An International Journal
Multimodal interactive transcription of text images
Pattern Recognition
HMM-based system for recognizing words in historical Arabic manuscript
International Journal of Robotics and Automation
CAIP'07 Proceedings of the 12th international conference on Computer analysis of images and patterns
Recognition of handwritten Arabic (Indian) numerals using Radon-Fourier-based features
ISPRA'10 Proceedings of the 9th WSEAS international conference on Signal processing, robotics and automation
The use of radon transform in handwritten Arabic (Indian) numerals recognition
WSEAS Transactions on Computers
Recognition of Arabic (Indian) bank check digits using log-gabor filters
Applied Intelligence
Mono-font cursive arabic text recognition using speech recognition system
SSPR'06/SPR'06 Proceedings of the 2006 joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition
A robust free size OCR for omni-font persian/arabic printed document using combined MLP/SVM
CIARP'05 Proceedings of the 10th Iberoamerican Congress conference on Progress in Pattern Recognition, Image Analysis and Applications
Spontaneous handwriting text recognition and classification using finite-state models
IbPRIA'05 Proceedings of the Second Iberian conference on Pattern Recognition and Image Analysis - Volume Part II
Natural language inspired approach for handwritten text line detection in legacy documents
LaTeCH '12 Proceedings of the 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
Computer assisted transcription for ancient text images
ICIAR'07 Proceedings of the 4th international conference on Image Analysis and Recognition
Offline arabic handwritten text recognition: A Survey
ACM Computing Surveys (CSUR)
A data acquisition and analysis system for palm leaf documents in Telugu
Proceeding of the workshop on Document Analysis and Recognition
KHATT: An open Arabic offline handwritten text database
Pattern Recognition
Hi-index | 0.15 |
We present an omnifont, unlimited-vocabulary OCR system for English and Arabic. The system is based on Hidden Markov Models (HMM), an approach that has proven to be very successful in the area of automatic speech recognition. In this paper we focus on two aspects of the OCR system. First, we address the issue of how to perform OCR on omnifont and multi-style data, such as plain and italic, without the need to have a separate model for each style. The amount of training data from each style, which is used to train a single model, becomes an important issue in the face of the conditional independence assumption inherent in the use of HMMs. We demonstrate mathematically and empirically how to allocate training data among the different styles to alleviate this problem. Second, we show how to use a word-based HMM system to perform character recognition with unlimited vocabulary. The method includes the use of a trigram language model on character sequences. Using all these techniques, we have achieved character error rates of 1.1 percent on data from the University of Washington English Document Image Database and 3.3 percent on data from the DARPA Arabic OCR Corpus.