SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
A Lexicon Driven Approach to Handwritten Word Recognition for Real-Time Applications
IEEE Transactions on Pattern Analysis and Machine Intelligence
Modern Information Retrieval
Using hidden Markov modeling to decompose human-written summaries
Computational Linguistics - Summarization
Retrieval methods for English-text with missrecognized OCR characters
ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
A search engine for historical manuscript images
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Boosted decision trees for word recognition in handwritten document retrieval
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Hangul Document Image Retrieval System Using Rank-based Recognitio
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Vector Model Based Indexing and Retrieval of Handwritten Medical Forms
ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 01
Topic based language models for OCR correction
Proceedings of the second workshop on Analytics for noisy unstructured text data
Automatic recognition of handwritten medical forms for search engines
International Journal on Document Analysis and Recognition
A probabilistic method for keyword retrieval in handwritten document images
Pattern Recognition
Handwritten Arabic text line segmentation using affinity propagation
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
A line-based representation for matching words in historical manuscripts
Pattern Recognition Letters
An information extraction system from patient historical documents
Proceedings of the 27th Annual ACM Symposium on Applied Computing
DocExplore: overcoming cultural and physical barriers to access ancient documents
Proceedings of the 2012 ACM symposium on Document engineering
Hi-index | 0.00 |
With the continuous growth of the World Wide Web, there is an urgent need for an efficient information retrieval system which can search and retrieve handwritten documents when presented with user queries. However, unconstrained handwriting recognition remains to be a challenging task with inadequate performance (around 30%, accuracy) thus proving to be a major hurdle in providing a robust search experience in the domain of handwritten documents. In this paper, we describe our recent research with focus on information retrieval from noisy text output by imperfect recognizers applied to handwritten document images. We describe three techniques each exploring a different approach for solving the noisy text retrieval task. The first method uses a novel bootstrapping mechanism to refine the OCR'ed text and uses the cleaned text for retrieval. The second method uses the uncorrected or raw OCR'ed text but modifies the standard vector space model for handling noisy text issues. The third method employs robust image features to index the documents instead of using noisy OCR'ed text. We describe these approaches in detail and also present their performance using standard IR evaluation metrics.