The nature of statistical learning theory
The nature of statistical learning theory
A fuzzy decision strategy for topic identification and dynamic selection of language models
Signal Processing - Special issue on fuzzy logic in signal processing
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
TopCat: Data Mining for Topic Identification in a Text Corpus
IEEE Transactions on Knowledge and Data Engineering
Improved topic-dependent language modeling using information retrieval techniques
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Hi-index | 0.00 |
In this paper we present two well-known categorization methods and their use in topic identification for Modern Standard Arabic. The first one is the TFIDF approach, and the second is a Support Vector Machines (SVM) based classifier. In the best of our knowledge, we do not know several precedent works on Arabic topic identification, which is the task we investigate in this article. The corpus we used is extracted from the daily Arabic newspaper 'Alkhabar', which includes 6000 news articles, corresponding to nearly 3 millions of words covering the topics: local news, sport, international news and economy. According to our experiments, the results are encouraging both for SVM and TFIDF classifier, particularly if we compare them with those found for French language. However we have noticed the superiority of the SVM classifier and its high capability to distinguish topics. In addition we have shown the effect of vocabulary size in results enhancement. Other experiments have been conducted to determine the minimum of words necessary in order to achieve acceptable results. This is particularly interesting for speech recognition when language adaptation is needed.