Processing Amazighe language

Authors:
Mohamed Outahajala
Affiliations:
Royal Institut for Amazighe Culture, Rabat, Morocco and Ecole Mohammadia d'Ingénieurs, Université Med V, Rabat, Morocco
Venue:
NLDB'11 Proceedings of the 16th international conference on Natural language processing and information systems
Year:
2011

Citing 3
Cited 0

The nature of statistical learning theory

The nature of statistical learning theory
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Using the Web as corpus for self-training text categorization

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Amazighe is a language spoken by millions of people in north Africa in majority, however, it is suffering from scarcity resources. The aim of this PhD thesis is to contribute to provide elementary resources and tools to process this language. In order to achieve this goal, we have achieved an annotated corpus of ∼20k tokens and trained two sequence classification models using Support Vector Machines (SVMs) and Conditional Random Fields (CRFs). We have used the 10-fold technique to evaluate our approach. Results show that the performance of SVMs and CRFs are very comparable, however, CRFs outperformed SVMs on the 10 folds average level (88.66% vs. 88.27%). For future steps, we are planning to use semi-supervised techniques to accelerate part-of-speech (POS) annotation in order to increase accuracy, afterwards to approach base phrase chunking, for future work.