Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
The Internet in India: better times ahead?
Communications of the ACM
Cross language information retrieval: a research roadmap
ACM SIGIR Forum
ACM Transactions on Asian Language Information Processing (TALIP)
Keylekh: a keyboard for text entry in indic scripts
CHI '04 Extended Abstracts on Human Factors in Computing Systems
WebKhoj: Indian language IR from multiple character encodings
Proceedings of the 15th international conference on World Wide Web
Hi-index | 0.00 |
Most of the Internet data for Indian languages exist in various encodings, causing difficulties in searching for the information through search engines. In the Indian scenario, majority web pages are not searchable or the intended information is not efficiently retrieved by the search engines due to the following: (1) Multiple text-encodings are used while authoring websites. (2) Inspite of Indian languages sharing common phonetic nature, common words like loan words (borrowed from other languages like Sanskrit, Urdu or English), transliterated terms, pronouns etc., can not be searched across languages. (3) Finally the query input mechanism is another major problem. Most of the users hardly know how to type in their native language and prefer to access the information through English based transliteration. This paper addresses all these problems and presents a transliteration based search engine (inSearch) which is capable of searching 10 multi-script and multiencoded Indian languages content on the web.