Automated learning of decision rules for text categorization
ACM Transactions on Information Systems (TOIS)
Machine Learning
Making large-scale support vector machine learning practical
Advances in kernel methods
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Feature Selection for Unbalanced Class Distribution and Naive Bayes
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Feature Selection via Set Cover
KDEX '97 Proceedings of the 1997 IEEE Knowledge and Data Engineering Exchange Workshop
Consistency-based search in feature selection
Artificial Intelligence
Journal of the American Society for Information Science and Technology
Scoring and Selecting Terms for Text Categorization
IEEE Intelligent Systems
Introducing a Family of Linear Measures for Feature Selection in Text Categorization
IEEE Transactions on Knowledge and Data Engineering
A support vector method for multivariate performance measures
ICML '05 Proceedings of the 22nd international conference on Machine learning
The relationship between Precision-Recall and ROC curves
ICML '06 Proceedings of the 23rd international conference on Machine learning
Training linear SVMs in linear time
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Using Laplace and angular measures for Feature Selection in Text Categorisation
International Journal of Advanced Intelligence Paradigms
Hi-index | 0.01 |
In this paper the performance of the Set Cover (SC) Feature Selection (FS) method for Text Categorisation (TC) and Spam Detection problems is studied. Several variants of the original method are presented either to overcome the drawback of the unbalanced problems which are usually present in TC or to increase the efficiency. The behaviour of the algorithm is tested on several collections. The experiments show these methods provide a great reduction in the dimensionality of the problem either keeping the effectiveness of the classification or causing just a slight decrease.