Unsupervised discovery of Persian morphemes

  • Authors:
  • Mohsen Arabsorkhi;Mehrnoush Shamsfard

  • Affiliations:
  • Shiraz University, Shiraz, Iran;Shahid Beheshti University, Tehran, Iran

  • Venue:
  • EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper reports the present results of a research on unsupervised Persian morpheme discovery. In this paper we present a method for discovering the morphemes of Persian language through automatic analysis of corpora. We utilized a Minimum Description Length (MDL) based algorithm with some improvements and applied it to Persian corpus. Our improvements include enhancing the cost function using some heuristics, preventing the split of high frequency chunks, exploiting penalty for first and last letters and distinguishing pre-parts and post-parts. Our improved approach has raised the precision, recall and f-measure of discovery by respectively %32, %17 and %23.