The nature of statistical learning theory
The nature of statistical learning theory
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Using the Web as corpus for self-training text categorization
Information Retrieval
Hi-index | 0.00 |
Amazighe is a language spoken by millions of people in north Africa in majority, however, it is suffering from scarcity resources. The aim of this PhD thesis is to contribute to provide elementary resources and tools to process this language. In order to achieve this goal, we have achieved an annotated corpus of ∼20k tokens and trained two sequence classification models using Support Vector Machines (SVMs) and Conditional Random Fields (CRFs). We have used the 10-fold technique to evaluate our approach. Results show that the performance of SVMs and CRFs are very comparable, however, CRFs outperformed SVMs on the 10 folds average level (88.66% vs. 88.27%). For future steps, we are planning to use semi-supervised techniques to accelerate part-of-speech (POS) annotation in order to increase accuracy, afterwards to approach base phrase chunking, for future work.