Introduction to Information Retrieval
Introduction to Information Retrieval
Approximate String Matching Techniques for Effective CLIR Among Indian Languages
WILF '07 Proceedings of the 7th international workshop on Fuzzy Logic and Applications: Applications of Fuzzy Sets Theory
Hi-index | 0.00 |
Because of the lack of resources Cross-lingual information retrieval is a difficult task for many Indian languages. Google Translate provides an easy way of translation from Indian languages to English but due to lexicon limitations most of the out-of-vocabulory words get transliterated letter by letter along with their suffix resulting in an unusually long string. The resulting string often does not match its intended translation which hurts retrieval. We propose an approach to extract the correct word from such strings using word segmentation along with approximate string matching using Soundex algorithm & Levenshtein distance. We evaluate our approach across three Indian languages and find an average improvement of 5.8% MAP on the FIRE-2010 dataset.