Concept search in Urdu

Authors:
Kashif Riaz
Affiliations:
University of Minnesota, Minneapolis, MN, USA
Venue:
Proceedings of the 2nd PhD workshop on Information and knowledge management
Year:
2008

Citing 10
Cited 2

Lexical analysis and stoplists

Information retrieval
An explanation of the effectiveness of latent semantic indexing by means of a Bayesian regression model

Information Processing and Management: an International Journal
A stemming procedure and stopword list for general French corpora

Journal of the American Society for Information Science
Modern Information Retrieval

Modern Information Retrieval
Using Linear Algebra for Intelligent Information Retrieval

Using Linear Algebra for Intelligent Information Retrieval
A Stemming Algorithm for the Farsi Language

ITCC '05 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume I - Volume 01
A study in Urdu corpus construction

COLING '02 Proceedings of the 3rd workshop on Asian language resources and international standardization - Volume 12
Introduction to a new Farsi stemmer

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A new approach for evaluating query expansion: query-document term mismatch

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
WordNet: similarity - measuring the relatedness of concepts

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence

Challenging research issues in data mining, databases and information retrieval

ACM SIGKDD Explorations Newsletter
Rule-based named entity recognition in Urdu

NEWS '10 Proceedings of the 2010 Named Entities Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a thesis proposal to do concept search in non English and non European languages. Urdu is chosen as an example language because of its unique nature, morphology and a large number of speakers. Besides its importance, Urdu does not have adequate language resources to do intellectual research in Information Retrieval (IR). It is shown that methods used for English language for concept searching are inadequate for Urdu. Some novel approaches for concept searching are also presented. Pre-processing IR tasks such as stop word identification and stemming require complex research for a morphological rich language like Urdu. Named-entity identification is hypothesized to be useful in determining the concept being sought by the user and research plan includes an implementation of named-entity identification for Urdu. An Urdu language toolkit will be made available to the IR community for Urdu language processing. Finally, a TREC like evaluation criteria is presented with relevance judgments, test collection and queries for Urdu IR.