Experiments in CLIR using fuzzy string search based on surface similarity

Authors:
Sethuramalingam Subramaniam;Anil Kumar Singh;Pradeep Dasigi;Vasudeva Varma
Affiliations:
International Institute of Information Technology, Hyderabad, India;International Institute of Information Technology, Hyderabad, India;International Institute of Information Technology, Hyderabad, India;International Institute of Information Technology, Hyderabad, India
Venue:
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Year:
2009

Citing 4
Cited 0

Information retrieval system evaluation: effort, sensitivity, and reliability

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Dynamic programming matching for large scale information retrieval

AsianIR '03 Proceedings of the sixth international workshop on Information retrieval with Asian languages - Volume 11
Measuring language divergence by intra-lexical comparison

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Automatic construction of Japanese KATAKANA variant list from large corpus

COLING '04 Proceedings of the 20th international conference on Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cross Language Information Retrieval (CLIR) between languages of the same origin is an interesting topic of research. The similarity of the writing systems used for these languages can be used effectively to not only improve CLIR, but to overcome the problems of textual variations, textual errors, and even the lack of linguistic resources like stemmers to an extent. We have conducted CLIR experiments between three languages which use writing systems (scripts) of Brahmi-origin, namely Hindi, Bengali and Marathi. We found significant improvements for all the six language pairs using a method for fuzzy text search based on Surface Similarity. In this paper we report these results and compare them with a baseline CLIR system and a CLIR system that uses Scaled Edit Distance (SED) for fuzzy string matching.