Keyword Spotting Techniques for Sanskrit Documents

Authors:
Anurag Bhardwaj;Srirangaraj Setlur;Venu Govindaraju
Affiliations:
Center for Unified Biometrics and Sensors Department of Computer Science and Engineering, University at Buffalo, Amherst, NY --- 14228;Center for Unified Biometrics and Sensors Department of Computer Science and Engineering, University at Buffalo, Amherst, NY --- 14228;Center for Unified Biometrics and Sensors Department of Computer Science and Engineering, University at Buffalo, Amherst, NY --- 14228
Venue:
Sanskrit Computational Linguistics
Year:
2009

Citing 7
Cited 0

On Image Analysis by the Methods of Moments

IEEE Transactions on Pattern Analysis and Machine Intelligence
Digital Pattern Recognition by Moments

Journal of the ACM (JACM)
Using Hierarchical Shape Models to Spot Keywords in Cursive Handwriting Data

CVPR '98 Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Features for Word Spotting in Historical Manuscripts

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
A search engine for historical manuscript images

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Boosted decision trees for word recognition in handwritten document retrieval

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Hangul Document Image Retrieval System Using Rank-based Recognitio

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

With advances in the field of digitization of printed documents and several mass digitization projects underway, information retrieval and document search have emerged as key research areas. However, most of the current work in these areas is limited to English and a few oriental languages. The lack of efficient solutions for Indic scripts and languages such as Sanskrit has hampered information extraction from a large body of documents of cultural and historical importance. This chapter presents two relevant topics in this area. First, we describe the use of a script specific Keyword Spotting for Sanskrit documents that makes use of domain knowledge of the script. Second, we address the needs of a digital library to provide access to a collection of documents from multiple scripts. This requires intelligent solutions which scale across different scripts. We present a script independent Keyword Spotting approach for this purpose. Experimental results illustrate the efficacy of our methods.