Concept search in Urdu

  • Authors:
  • Kashif Riaz

  • Affiliations:
  • University of Minnesota, Minneapolis, MN, USA

  • Venue:
  • Proceedings of the 2nd PhD workshop on Information and knowledge management
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a thesis proposal to do concept search in non English and non European languages. Urdu is chosen as an example language because of its unique nature, morphology and a large number of speakers. Besides its importance, Urdu does not have adequate language resources to do intellectual research in Information Retrieval (IR). It is shown that methods used for English language for concept searching are inadequate for Urdu. Some novel approaches for concept searching are also presented. Pre-processing IR tasks such as stop word identification and stemming require complex research for a morphological rich language like Urdu. Named-entity identification is hypothesized to be useful in determining the concept being sought by the user and research plan includes an implementation of named-entity identification for Urdu. An Urdu language toolkit will be made available to the IR community for Urdu language processing. Finally, a TREC like evaluation criteria is presented with relevance judgments, test collection and queries for Urdu IR.