Transliteration based search engine for multilingual information access

  • Authors:
  • Anand Arokia Raj;Harikrishna Maganti

  • Affiliations:
  • Bhrigus Software (I) Pvt Ltd, Hyderabad, India;Bhrigus Software (I) Pvt Ltd, Hyderabad, India

  • Venue:
  • CLIAWS3 '09 Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most of the Internet data for Indian languages exist in various encodings, causing difficulties in searching for the information through search engines. In the Indian scenario, majority web pages are not searchable or the intended information is not efficiently retrieved by the search engines due to the following: (1) Multiple text-encodings are used while authoring websites. (2) Inspite of Indian languages sharing common phonetic nature, common words like loan words (borrowed from other languages like Sanskrit, Urdu or English), transliterated terms, pronouns etc., can not be searched across languages. (3) Finally the query input mechanism is another major problem. Most of the users hardly know how to type in their native language and prefer to access the information through English based transliteration. This paper addresses all these problems and presents a transliteration based search engine (inSearch) which is capable of searching 10 multi-script and multiencoded Indian languages content on the web.