Hamshahri: A standard Persian text collection

Authors:
Abolfazl AleAhmad;Hadi Amiri;Ehsan Darrudi;Masoud Rahgozar;Farhad Oroumchian
Affiliations:
Database Research Group, Control and Intelligent Processing Center Of Excellence, School of Electrical and Computer Engineering, Campus #2, University of Tehran, North Kargar St., Tehran, Iran;Database Research Group, Control and Intelligent Processing Center Of Excellence, School of Electrical and Computer Engineering, Campus #2, University of Tehran, North Kargar St., Tehran, Iran;Database Research Group, Control and Intelligent Processing Center Of Excellence, School of Electrical and Computer Engineering, Campus #2, University of Tehran, North Kargar St., Tehran, Iran;Database Research Group, Control and Intelligent Processing Center Of Excellence, School of Electrical and Computer Engineering, Campus #2, University of Tehran, North Kargar St., Tehran, Iran;Database Research Group, Control and Intelligent Processing Center Of Excellence, School of Electrical and Computer Engineering, Campus #2, University of Tehran, North Kargar St., Tehran, Iran and ...
Venue:
Knowledge-Based Systems
Year:
2009

Citing 7
Cited 7

Pivoted document length normalization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
How reliable are the results of large-scale information retrieval experiments?

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Language Model-based Retieval for Farsi Documents

ITCC '04 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'04) Volume 2 - Volume 2
Experiments with persian text compression for web

Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
A Stemming Algorithm for the Farsi Language

ITCC '05 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume I - Volume 01
FuFaIR: a Fuzzy Farsi Information Retrieval System

AICCSA '06 Proceedings of the IEEE International Conference on Computer Systems and Applications

Cross language experiments at Persian@CLEF 2008

CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Evaluation of perstem: a simple and efficient stemming algorithm for Persian

CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Ad hoc information retrieval for Persian

CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Creating a Persian-English comparable corpus

CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
A semi-supervised approach for key-synset extraction to be used in word sense disambiguation

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Persian text classification based on K-NN using wordnet

IEA/AIE'12 Proceedings of the 25th international conference on Industrial Engineering and Other Applications of Applied Intelligent Systems: advanced research in applied artificial intelligence
Mining a Persian-English comparable corpus for cross-language information retrieval

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Persian language is one of the dominant languages in the Middle East, so there are significant amount of Persian documents available on the Web. Due to the different nature of the Persian language compared to the other languages such as English, the design of information retrieval systems in Persian requires special considerations. However, there are relatively few studies on retrieval of Persian documents in the literature and one of the main reasons is the lack of a standard test collection. In this paper, we introduce a standard Persian text collection, named Hamshahri, which is built from a large number of newspaper articles according to TREC specifications. Furthermore, statistical information about documents, queries and their relevance judgments are presented in this paper. We believe that this collection is the largest Persian text collection, so far.