Experiments in CLIR using fuzzy string search based on surface similarity

  • Authors:
  • Sethuramalingam Subramaniam;Anil Kumar Singh;Pradeep Dasigi;Vasudeva Varma

  • Affiliations:
  • International Institute of Information Technology, Hyderabad, India;International Institute of Information Technology, Hyderabad, India;International Institute of Information Technology, Hyderabad, India;International Institute of Information Technology, Hyderabad, India

  • Venue:
  • Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Cross Language Information Retrieval (CLIR) between languages of the same origin is an interesting topic of research. The similarity of the writing systems used for these languages can be used effectively to not only improve CLIR, but to overcome the problems of textual variations, textual errors, and even the lack of linguistic resources like stemmers to an extent. We have conducted CLIR experiments between three languages which use writing systems (scripts) of Brahmi-origin, namely Hindi, Bengali and Marathi. We found significant improvements for all the six language pairs using a method for fuzzy text search based on Surface Similarity. In this paper we report these results and compare them with a baseline CLIR system and a CLIR system that uses Scaled Edit Distance (SED) for fuzzy string matching.