Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Making large-scale support vector machine learning practical
Advances in kernel methods
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A vector space model for automatic indexing
Communications of the ACM
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature Selection for Unbalanced Class Distribution and Naive Bayes
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
Adaptive anti-spam filtering for agglutinative languages: a special case for Turkish
Pattern Recognition Letters
A study on automatically extracted keywords in text categorization
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A parameter-free hybrid clustering algorithm used for malware categorization
ASID'09 Proceedings of the 3rd international conference on Anti-Counterfeiting, security, and identification in communication
Analytical evaluation of term weighting schemes for text categorization
Pattern Recognition Letters
Text classification with the support of pruned dependency patterns
Pattern Recognition Letters
Automatically computed document dependent weighting factor facility for Naïve Bayes classification
Expert Systems with Applications: An International Journal
NeSp-NLP '10 Proceedings of the Workshop on Negation and Speculation in Natural Language Processing
Text categorization methods for automatic estimation of verbal intelligence
Expert Systems with Applications: An International Journal
Text categorization based on fuzzy soft set theory
ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part IV
A three-phase method for patent classification
Information Processing and Management: an International Journal
Comparison of text feature selection policies and using an adaptive framework
Expert Systems with Applications: An International Journal
Pacc - a discriminative and accuracy correlated measure for assessment of classification results
MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
Random walks based modularity: application to semi-supervised learning
Proceedings of the 23rd international conference on World wide web
Hi-index | 0.00 |
In this paper, we examine the use of keywords in text categorization with SVM. In contrast to the usual belief, we reveal that using keywords instead of all words yields better performance both in terms of accuracy and time. Unlike the previous studies that focus on keyword selection metrics, we compare the two approaches for keyword selection. In corpus-based approach, a single set of keywords is selected for all classes. In class-based approach, a distinct set of keywords is selected for each class. We perform the experiments with the standard Reuters-21578 dataset, with both boolean and tf-idf weighting. Our results show that although tf-idf weighting performs better, boolean weighting can be used where time and space resources are limited. Corpus-based approach with 2000 keywords performs the best. However, for small number of keywords, class-based approach outperforms the corpus-based approach with the same number of keywords.