Improving persian text classification using persian thesaurus

  • Authors:
  • Hamid Parvin;Behrouz Minaei-Bidgoli;Atousa Dahbashi

  • Affiliations:
  • School of Computer Engineering, Iran University of Science and Technology (IUST), Tehran, Iran;School of Computer Engineering, Iran University of Science and Technology (IUST), Tehran, Iran;School of Computer Engineering, Iran University of Science and Technology (IUST), Tehran, Iran

  • Venue:
  • CIARP'11 Proceedings of the 16th Iberoamerican Congress conference on Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes an innovative approach to improve the performance of Persian text classification. The proposed method uses a thesaurus as a helpful knowledge to obtain the real frequencies of words in the corpus. Three types of relationships are considered in our thesaurus. This is the first attempt to use a Persian thesaurus in the field of Persian information retrieval. Experimental results show a significant improvement in the case of employing Persian thesaurus rather common methods.