Improving Persian information retrieval systems using stemming and part of speech tagging

Authors:
Reza Karimpour;Amineh Ghorbani;Azadeh Pishdad;Mitra Mohtarami;Abolfazl AleAhmad;Hadi Amiri;Farhad Oroumchian
Affiliations:
Electerical and Computer Engineering Faculty, University of Tehran;Electerical and Computer Engineering Faculty, University of Tehran;Electerical and Computer Engineering Faculty, University of Tehran;Electerical and Computer Engineering Faculty, University of Tehran;Electerical and Computer Engineering Faculty, University of Tehran;Electerical and Computer Engineering Faculty, University of Tehran;University of Wollongong in Dubai
Venue:
CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Year:
2008

Citing 8
Cited 1

Natural language processing for information retrieval

Communications of the ACM
Pivoted document length normalization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Implementing an efficient part-of-speech tagger

Software—Practice & Experience
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Role of verbs in document analysis

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Interpretation of proper nouns for information retrieval

HLT '93 Proceedings of the workshop on Human Language Technology
CLEF 2008: ad hoc track overview

CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Managing Gigabytes: Compressing and Indexing Documents and Images

IEEE Transactions on Information Theory - Part 2

CLEF 2008: ad hoc track overview

CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the emergence of vast resources of information, it is necessary to develop methods that retrieve the most relevant information according to needs. These retrieval methods may benefit from natural language constructs to boost their results by achieving higher precision and recall rates. In this study, we have used part of speech properties of terms as extra source of information about document and query terms and have evaluated the impact of such data on the performance of the Persian retrieval algorithms. Furthermore the effect of stemming has been experimented as a complement to this research. Our findings indicate that part of speech tags may have small influence on effectiveness of the retrieved results. However, when this information is combined with stemming it improves the accuracy of the outcomes considerably.