A matrix representation of the inflectional forms of Arabic words: a study of co-occurrence patterns

  • Authors:
  • H. E. Mahgoub;M. A. Hashish;A. T. Hassanein

  • Affiliations:
  • IBM Cairo Scientific Centre, Mohandessen, Giza, Egypt;IBM Cairo Scientific Centre, Mohandessen, Giza, Egypt;American University in Cairo

  • Venue:
  • COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 3
  • Year:
  • 1990

Quantified Score

Hi-index 0.00

Visualization

Abstract

A proposed "Matrix" method for the representation of the inflectional paradigms of Arabic words is presented. This representation results in a classification of Arabic words into a tree structure (Fig(1)) whose leaves represent unique conjugational or derivational paradigms, each represented in the proposed "Matrix" form.A study of about 2,500 stems from a high frequency Arabic wordlist due to Landau has revealed a systematic set of co-occurrence patterns for the enclitic pronouns of Arabic verbs and for the possessive pronouns attached to Arabic nouns. Each co-occurrence pattern represents a subcategorization frame that reflects the underlying semantic relationship.The key feature that distinguishes these semantic patterns has been observed to be whether the attached suffixes relate to the animate or inanimate. In some cases for verbs, the number of the subject is also a significant feature. These semantic features also extend to non-attached subjects and objects (for verbs) and to possessive noun complements (for nouns). Therefore the semantic classes presented in this paper also assist in syntactic/semantic analysis.The first application that was developed, based upon the proposed representation is a stem-based Arabic morphological analyser, from which a spell checker (on a PS/2 microcomputer) emerged as a by-product. Currently, the system is being used to interact with an Arabic syntactic parser and there are plans to use it in a machine assisted translation system.