A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Topic-conditioned novelty detection
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Accurate methods for the statistics of surprise and coincidence
Computational Linguistics - Special issue on using large corpora: I
Extracting nested collocations
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Text mining without document context
Information Processing and Management: an International Journal - Special issue: Informetrics
Text classification based on multi-word with support vector machine
Knowledge-Based Systems
A Hybrid Approach to Improve Bilingual Multiword Expression Extraction
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Automatic Arabic document categorization based on the Naïve Bayes algorithm
Semitic '04 Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages
Accommodating multiword expressions in an arabic LFG grammar
FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Hi-index | 0.00 |
This paper focuses on Topic Detection (TD) for Arabic Unvowelized documents. Our topic detection system was implemented using two different metrics: adapted TF-IDF and Jaccard indicator. The experiments were conducted while studying the impact of working with stems or roots of words, all the words or nouns only. To enhance the TD system we developed The MWTs extraction prototype to generate MWTs vocabularies. To the best of our knowledge MWTs vocabulary has never been used in arabic documents topic's detection. In this paper we investigate the impact of such use on the quality of topic detection. We used the standard measures: Recall, Precision and F-measure to evaluate the performance of the realized systems on Wattan; an Arabic newspaper corpus.