Word spotting: indexing handwritten manuscripts
Intelligent multimedia information retrieval
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Cross-lingual relevance models
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary
ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV
Automatic image annotation and retrieval using cross-media relevance models
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Holistic Word Recognition for Handwritten Historical Documents
DIAL '04 Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04)
A Scale Space Approach for Automatically Segmenting Words from Historical Handwritten Documents
IEEE Transactions on Pattern Analysis and Machine Intelligence
Boosted decision trees for word recognition in handwritten document retrieval
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Ontology Guided Access to Document Images
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Font Adaptive Word Indexing of Modern Printed Documents
IEEE Transactions on Pattern Analysis and Machine Intelligence
Text search for medieval manuscript images
Pattern Recognition
Document image analysis for digital libraries
Proceedings of the 2006 international workshop on Research issues in digital libraries
Keyword Spotting Techniques for Sanskrit Documents
Sanskrit Computational Linguistics
Towards an omnilingual word retrieval system for ancient manuscripts
Pattern Recognition
Hierarchical approximate matching for retrieval of chinese historical calligraphy character
Journal of Computer Science and Technology
Handwritten document retrieval strategies
Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
A probabilistic method for keyword retrieval in handwritten document images
Pattern Recognition
A Web-Based Search Engine for Chinese Calligraphic Manuscript Images
ICWL '009 Proceedings of the 8th International Conference on Advances in Web Based Learning
Efficient Language-Independent Retrieval of Printed Documents without OCR
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Efficient search in document image collections
ACCV'07 Proceedings of the 8th Asian conference on Computer vision - Volume Part I
Nearest neighbor based collection OCR
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Proceedings of the ACM International Conference on Image and Video Retrieval
A Document Image Retrieval System
Engineering Applications of Artificial Intelligence
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Text line segmentation for gray scale historical document images
Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Learning shapes for image classification and retrieval
CIVR'05 Proceedings of the 4th international conference on Image and Video Retrieval
Ranking fusion methods applied to on-line handwriting information retrieval
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Lexicon-free handwritten word spotting using character HMMs
Pattern Recognition Letters
Aligning transcripts to automatically segmented handwritten manuscripts
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
AMR'10 Proceedings of the 8th international conference on Adaptive Multimedia Retrieval: context, exploration, and fusion
Information retrieval strategies for digitized handwritten medieval documents
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
A synthesised word approach to word retrieval in handwritten documents
Pattern Recognition
Exploring digital libraries with document image retrieval
ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
Recognition of Kannada characters extracted from scene images
Proceeding of the workshop on Document Analysis and Recognition
Using Lucene to index and search the digitized 1940 US census
Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery
Text line extraction for historical document images
Pattern Recognition Letters
Keyword spotting in unconstrained handwritten Chinese documents using contextual word model
Image and Vision Computing
Hi-index | 0.00 |
Many museum and library archives are digitizing their large collections of handwritten historical manuscripts to enable public access to them. These collections are only available in image formats and require expensive manual annotation work for access to them. Current handwriting recognizers have word error rates in excess of 50% and therefore cannot be used for such material. We describe two statistical models for retrieval in large collections of handwritten manuscripts given a text query. Both use a set of transcribed page images to learn a joint probability distribution between features computed from word images and their transcriptions. The models can then be used to retrieve unlabeled images of handwritten documents given a text query. We show experiments with a training set of 100 transcribed pages and a test set of 987 handwritten page images from the George Washington collection. Experiments show that the precision at 20 documents is about 0.4 to 0.5 depending on the model. To the best of our knowledge, this is the first automatic retrieval system for historical manuscripts using text queries, without manual transcription of the original corpus.