A Search Engine for Indian Languages

  • Authors:
  • Ashwani Mujoo;Manoj Kumar Malviya;Rajat Moona;T. V. Prabhakar

  • Affiliations:
  • -;-;-;-

  • Venue:
  • EC-WEB '00 Proceedings of the First International Conference on Electronic Commerce and Web Technologies
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

There is a great need for a search engine for web documents written in languages other than English. In this paper, we describe the design issues of a Search Engine for Indian Languages. We also describe the implementation of two Search Engines for Indian Languages, one for documents in ISCII and the other for documents in Unicode. The software allows full-text indexing and searching of a database of documents written in any Brahmi-based Indian Language. The Search engine gathers the HTML documents from the web, indexes and compresses the documents and then searches for the given, keywords. The main features of the search engines are phonetic tolerance, morphological analysis, compression and indexing, leading and trailing substring matches for keywords, search through compressed documents. The implementation includes a search architecture, which can be accessed from a WYSIWYG front end, which is a Java swing applet. Performance results show that the search engine achieves a compression of almost 80 percent and has an appreciable precision and recall.