C4.5: programs for machine learning
C4.5: programs for machine learning
Word spotting: indexing handwritten manuscripts
Intelligent multimedia information retrieval
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence
Document language models, query models, and risk minimization for information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A Bayesian Approach to Unsupervised One-Shot Learning of Object Categories
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Fast Handwriting Recognition for Indexing Historical Documents
DIAL '04 Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04)
Holistic Word Recognition for Handwritten Historical Documents
DIAL '04 Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04)
Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models
IEEE Transactions on Pattern Analysis and Machine Intelligence
A search engine for historical manuscript images
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A closer look at boosted image retrieval
CIVR'03 Proceedings of the 2nd international conference on Image and video retrieval
BoostMap: a method for efficient approximate similarity rankings
CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Text search for medieval manuscript images
Pattern Recognition
Keyword Spotting Techniques for Sanskrit Documents
Sanskrit Computational Linguistics
Handwritten document retrieval strategies
Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
A probabilistic method for keyword retrieval in handwritten document images
Pattern Recognition
A keyword spotting approach using blurred shape model-based descriptors
Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Information retrieval strategies for digitized handwritten medieval documents
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Off-line cursive script recognition: current advances, comparisons and remaining problems
Artificial Intelligence Review
A synthesised word approach to word retrieval in handwritten documents
Pattern Recognition
Effect of ensemble classifier composition on offline cursive character recognition
Information Processing and Management: an International Journal
Hi-index | 0.00 |
Recognition and retrieval of historical handwritten material is an unsolved problem. We propose a novel approach to recognizing and retrieving handwritten manuscripts, based upon word image classification as a key step. Decision trees with normalized pixels as features form the basis of a highly accurate AdaBoost classifier, trained on a corpus of word images that have been resized and sampled at a pyramid of resolutions. To stem problems from the highly skewed distribution of class frequencies, word classes with very few training samples are augmented with stochastically altered versions of the originals. This increases recognition performance substantially. On a standard corpus of 20 pages of handwritten material from the George Washington collection the recognition performance shows a substantial improvement in performance over previous published results (75% vs 65%). Following word recognition, retrieval is done using a language model over the recognized words. Retrieval performance also shows substantially improved results over previously published results on this database. Recognition/retrieval results on a more challenging database of 100 pages from the George Washington collection are also presented.