The SALAH project: segmentation and linguistic analysis of ḥadīṯ arabic texts

  • Authors:
  • Marco Boella;Francesca Romana Romani;Anjela Al-Raies;Cristina Solimando;Giuliano Lancioni

  • Affiliations:
  • University;Roma Tre University, Rome, Italy;Roma Tre University, Rome, Italy;Roma Tre University, Rome, Italy;Roma Tre University, Rome, Italy

  • Venue:
  • AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

A model for the unsupervised segmentation and linguistic analysis of Arabic texts of Prophetic tradition (ḥadīṯ s), SALAH, is proposed. The model automatically segments each text unit in a transmitter chain (isnād ) and a text content (matn ) and further analyses each segment according to two distinct pipelines: a set of regular expressions chunks transmitter chains in a graph labeled with the relation between transmitters, while a tailored, augmented version of the AraMorph morphological analyzer (RAM) analyzes and annotates lexically and morphologically the text content. A graph with relations among transmitters and a lemmatized text corpus, both in XML format, are the final output of the system, which can further feed the automatic generation of concordances of the texts with variable-sized windows. The model results can be useful for a variety of purposes, including retrieving information from ḥadīṯ texts, verify the relations between transmitters, finding variant readings, supplying lexical information to specialized dictionaries.