The architecture of a standard Arabic lexical database: some figures, ratios and categories from the DIINAR.1 source program

  • Authors:
  • Ramzi Abbès;Joseph Dichy;Mohamed Hassoun

  • Affiliations:
  • SII / SILAT, ENSSIB, Villeubanne Cedex, France;ÉLISA / SILAT, Université Lumière-Lyon, Lyon Cedex, France;SII / SILAT, ENSSIB, Villeubanne Cedex, France

  • Venue:
  • Semitic '04 Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper is a contribution to the issue -- which has, in the course of the last decade, become critical -- of the basic requirements and validation criteria for lexical language resources in Standard Arabic. The work is based on a critical analysis of the architecture of the DIINAR.1 lexical database, the entries of which are associated with grammar-lexis relations operating at word-form level (i.e. in morphological analysis). Investigation shows a crucial difference, in the concept of 'lexical database', between source program and generated lexica. The source program underlying DIINAR.1 is analysed, and some figures and ratios are presented. The original categorisations are, in the course of scrutiny, partly revisited. Results and ratios given here for basic entries on the one hand, and for generated lexica of inflected word-forms on the other. They aim at giving a first answer to the question of the ratios between the number of lemma-entries and inflected word-forms that can be expected to be included in, or generated by, a Standard Arabic lexical dB. These ratios can be considered as one overall language-specific criterion for the analysis, evaluation and validation of lexical dB-s in Arabic.