Results of applying probabilistic IR to OCR text
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
The handwritten trie: indexing electronic ink
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Phonetic string matching: lessons from information retrieval
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
How to read less and know more: approximate OCR for Thai
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Improving retrieval on imperfect speech transcriptions (poster abstract)
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence
Phonetic confusion matrix based spoken document retrieval
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Writer dependent recognition of on-line unconstrained handwriting
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 06
Evaluation of Fusion for Similarity Searching in Online Handwritten Documents
ICDM '09 Proceedings of the 9th Industrial Conference on Advances in Data Mining. Applications and Theoretical Aspects
Hi-index | 0.00 |
The Pen Technologies group at IBM Research has recently been investigating methods for retrieving handwritten documents based on user queries. This paper investigates the use of typed and handwritten queries to retrieve relevant handwritten documents. The IBM handwriting recognition engine was used to generate N-best lists for the words in each of 108 short documents. These N-best lists are concise statistical representations of the handwritten words. These statistical representations enable the retrieval methods to be robust when there are machine transcription errors, allowing retrieval of documents that would be missed by a traditional transcription-based retrieval system. Our experimental results demonstrate that significant improvements in retrieval performance can be achieved compared to standard keyword text searching of machine-transcribed documents. We have developed a software architecture for a multimedia document retrieval framework into which machine learning algorithms for feature extraction and matching may be easily integrated. The framework provides a "plug-and-play" mechanism for the integration of new media types, new feature extraction methods, and new document types.