Approximate String Matching Techniques for Effective CLIR Among Indian Languages

  • Authors:
  • Ranbeer Makin;Nikita Pandey;Prasad Pingali;Vasudeva Varma

  • Affiliations:
  • International Institute of Information Technology, Hyderabad, India;International Institute of Information Technology, Hyderabad, India;International Institute of Information Technology, Hyderabad, India;International Institute of Information Technology, Hyderabad, India

  • Venue:
  • WILF '07 Proceedings of the 7th international workshop on Fuzzy Logic and Applications: Applications of Fuzzy Sets Theory
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Commonly used vocabulary in Indian language documents found on the web contain a number of words that have Sanskrit, Persian or English origin. However, such words may be written in different scripts with slight variations in spelling and morphology. In this paper we explore approximate string matching techniques to exploit this situation of relatively large number of cognates among Indian languages, which are higher when compared to an Indian language and a non-Indian language. We present an approach to identify cognates and make use of them for improving dictionary based CLIR when the query and documents both belong to two different Indian languages. We conduct experiments using a Hindi document collection and a set of Telugu queries and report the improvement due to cognate recognition and translation.