The architecture of a standard Arabic lexical database: some figures, ratios and categories from the DIINAR.1 source program

Authors:
Ramzi Abbès;Joseph Dichy;Mohamed Hassoun
Affiliations:
SII / SILAT, ENSSIB, Villeubanne Cedex, France;ÉLISA / SILAT, Université Lumière-Lyon, Lyon Cedex, France;SII / SILAT, ENSSIB, Villeubanne Cedex, France
Venue:
Semitic '04 Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages
Year:
2004

Citing 0
Cited 6

A Database for Arabic Printed Character Recognition

ICIAR '08 Proceedings of the 5th international conference on Image Analysis and Recognition
Arabic tokenization system

Semitic '07 Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources
A corpus for modeling morpho-syntactic agreement in Arabic: gender, number and rationality

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Identifying broken plurals, irregular gender, and rationality in Arabic text

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Offline arabic handwritten text recognition: A Survey

ACM Computing Surveys (CSUR)
On the evaluation and improvement of Arabic WordNet coverage and usability

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper is a contribution to the issue -- which has, in the course of the last decade, become critical -- of the basic requirements and validation criteria for lexical language resources in Standard Arabic. The work is based on a critical analysis of the architecture of the DIINAR.1 lexical database, the entries of which are associated with grammar-lexis relations operating at word-form level (i.e. in morphological analysis). Investigation shows a crucial difference, in the concept of 'lexical database', between source program and generated lexica. The source program underlying DIINAR.1 is analysed, and some figures and ratios are presented. The original categorisations are, in the course of scrutiny, partly revisited. Results and ratios given here for basic entries on the one hand, and for generated lexica of inflected word-forms on the other. They aim at giving a first answer to the question of the ratios between the number of lemma-entries and inflected word-forms that can be expected to be included in, or generated by, a Standard Arabic lexical dB. These ratios can be considered as one overall language-specific criterion for the analysis, evaluation and validation of lexical dB-s in Arabic.