Unsupervised identification of persian compound verbs

  • Authors:
  • Mohammad Sadegh Rasooli;Heshaam Faili;Behrouz Minaei-Bidgoli

  • Affiliations:
  • Department of Computer Engineering, Iran University of Science and Technology, Iran;School of Electrical & Computer Engineering, Tehran University, Iran;Department of Computer Engineering, Iran University of Science and Technology, Iran

  • Venue:
  • MICAI'11 Proceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part I
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

One of the main tasks related to multiword expressions (MWEs) is compound verb identification. There have been so many works on unsupervised identification of multiword verbs in many languages, but there has not been any conspicuous work on Persian language yet. Persian multiword verbs (known as compound verbs), are a kind of light verb construction (LVC) that have syntactic flexibility such as unrestricted word distance between the light verb and the nonverbal element. Furthermore, the nonverbal element can be inflected. These characteristics have made the task in Persian very difficult. In this paper, two different unsupervised methods have been proposed to automatically detect compound verbs in Persian. In the first method, extending the concept of pointwise mutual information (PMI) measure, a bootstrapping method has been applied. In the second approach, K-means clustering algorithm is used. Our experiments show that the proposed approaches have gained results superior to the baseline which uses PMI measure as its association metric.