Approximate String Matching Techniques for Effective CLIR Among Indian Languages

Authors:
Ranbeer Makin;Nikita Pandey;Prasad Pingali;Vasudeva Varma
Affiliations:
International Institute of Information Technology, Hyderabad, India;International Institute of Information Technology, Hyderabad, India;International Institute of Information Technology, Hyderabad, India;International Institute of Information Technology, Hyderabad, India
Venue:
WILF '07 Proceedings of the 7th international workshop on Fuzzy Logic and Applications: Applications of Fuzzy Sets Theory
Year:
2007

Citing 6
Cited 1

Querying across languages: a dictionary-based approach to multilingual information retrieval

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Fuzzy translation of cross-lingual spelling variants

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Bitext maps and alignment via pattern recognition

Computational Linguistics
Combining clues for word alignment

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Multipath translation lexicon induction via bridge languages

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies

Handling OOV words in indian-language --- english CLIR

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Commonly used vocabulary in Indian language documents found on the web contain a number of words that have Sanskrit, Persian or English origin. However, such words may be written in different scripts with slight variations in spelling and morphology. In this paper we explore approximate string matching techniques to exploit this situation of relatively large number of cognates among Indian languages, which are higher when compared to an Indian language and a non-Indian language. We present an approach to identify cognates and make use of them for improving dictionary based CLIR when the query and documents both belong to two different Indian languages. We conduct experiments using a Hindi document collection and a set of Telugu queries and report the improvement due to cognate recognition and translation.