Lexical analysis and stoplists
Information retrieval
Information Processing and Management: an International Journal
A stemming procedure and stopword list for general French corpora
Journal of the American Society for Information Science
Modern Information Retrieval
Using Linear Algebra for Intelligent Information Retrieval
Using Linear Algebra for Intelligent Information Retrieval
A Stemming Algorithm for the Farsi Language
ITCC '05 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume I - Volume 01
A study in Urdu corpus construction
COLING '02 Proceedings of the 3rd workshop on Asian language resources and international standardization - Volume 12
Introduction to a new Farsi stemmer
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A new approach for evaluating query expansion: query-document term mismatch
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
WordNet: similarity - measuring the relatedness of concepts
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Challenging research issues in data mining, databases and information retrieval
ACM SIGKDD Explorations Newsletter
Rule-based named entity recognition in Urdu
NEWS '10 Proceedings of the 2010 Named Entities Workshop
Hi-index | 0.00 |
This paper describes a thesis proposal to do concept search in non English and non European languages. Urdu is chosen as an example language because of its unique nature, morphology and a large number of speakers. Besides its importance, Urdu does not have adequate language resources to do intellectual research in Information Retrieval (IR). It is shown that methods used for English language for concept searching are inadequate for Urdu. Some novel approaches for concept searching are also presented. Pre-processing IR tasks such as stop word identification and stemming require complex research for a morphological rich language like Urdu. Named-entity identification is hypothesized to be useful in determining the concept being sought by the user and research plan includes an implementation of named-entity identification for Urdu. An Urdu language toolkit will be made available to the IR community for Urdu language processing. Finally, a TREC like evaluation criteria is presented with relevance judgments, test collection and queries for Urdu IR.