Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Copy detection mechanisms for digital documents
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
The Internet in India: better times ahead?
Communications of the ACM
ACM Transactions on Internet Technology (TOIT)
Accelerated focused crawling through online relevance feedback
Proceedings of the 11th international conference on World Wide Web
Unicode for multilingual representation in digital libraries from the indian perspective
Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Cross language information retrieval: a research roadmap
ACM SIGIR Forum
Linguistic resource creation for research and technology development: A recent experiment
ACM Transactions on Asian Language Information Processing (TALIP)
ACM Transactions on Asian Language Information Processing (TALIP)
Keylekh: a keyboard for text entry in indic scripts
CHI '04 Extended Abstracts on Human Factors in Computing Systems
Injecting information into atomic units of text
Proceedings of the 2005 ACM symposium on Document engineering
Issues in searching for Indian language web content
Proceedings of the 2nd ACM workshop on Improving non english web searching
Current research issues and trends in non-English Web searching
Information Retrieval
Transliteration based search engine for multilingual information access
CLIAWS3 '09 Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies
Design and implementation-algorithms of Amharic search engine system for Amharic web contents
NTMS'09 Proceedings of the 3rd international conference on New technologies, mobility and security
The FIRE 2008 Evaluation Exercise
ACM Transactions on Asian Language Information Processing (TALIP)
Sub-Word Indexing and Blind Relevance Feedback for English, Bengali, Hindi, and Marathi IR
ACM Transactions on Asian Language Information Processing (TALIP)
Domain specific search in indian languages
Proceedings of the first workshop on Information and knowledge management for developing region
Hindi, telugu, oromo, english CLIR evaluation
CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
A Comprehensive Study of Techniques for URL-Based Web Page Language Classification
ACM Transactions on the Web (TWEB)
Hi-index | 0.01 |
Today web search engines provide the easiest way to reach information on the web. In this scenario, more than 95% of Indian language content on the web is not searchable due to multiple encodings of web pages.Most of these encodings are proprietary and hence need some kind of standardization for making the content accessible via a search engine. In this paper we present a search engine called WebKhoj which is capable of searching multi-script and multi-encoded Indian language content on the web. We describe a language focused crawler and the transcoding processes involved to achieve accessibility of Indian langauge content. In the end we report some of the experiments that were conducted along with results on Indian language web content.