Processing Amazighe language

  • Authors:
  • Mohamed Outahajala

  • Affiliations:
  • Royal Institut for Amazighe Culture, Rabat, Morocco and Ecole Mohammadia d'Ingénieurs, Université Med V, Rabat, Morocco

  • Venue:
  • NLDB'11 Proceedings of the 16th international conference on Natural language processing and information systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Amazighe is a language spoken by millions of people in north Africa in majority, however, it is suffering from scarcity resources. The aim of this PhD thesis is to contribute to provide elementary resources and tools to process this language. In order to achieve this goal, we have achieved an annotated corpus of ∼20k tokens and trained two sequence classification models using Support Vector Machines (SVMs) and Conditional Random Fields (CRFs). We have used the 10-fold technique to evaluate our approach. Results show that the performance of SVMs and CRFs are very comparable, however, CRFs outperformed SVMs on the 10 folds average level (88.66% vs. 88.27%). For future steps, we are planning to use semi-supervised techniques to accelerate part-of-speech (POS) annotation in order to increase accuracy, afterwards to approach base phrase chunking, for future work.