Set Cover Feature Selection for Text Categorisation and spam detection

Authors:
Elias F. Combarro;Jose Ranilla;Manuel Roberto Berdasco;Elena Montanes;Irene Diaz
Affiliations:
Computer Science Department, University of Oviedo, Spain.;Computer Science Department, University of Oviedo, Spain.;Computer Science Department, University of Oviedo, Spain.;Computer Science Department, University of Oviedo, Spain.;Computer Science Department, University of Oviedo, Spain
Venue:
International Journal of Advanced Intelligence Paradigms
Year:
2009

Citing 17
Cited 0

Automated learning of decision rules for text categorization

ACM Transactions on Information Systems (TOIS)
Support-Vector Networks

Machine Learning
Making large-scale support vector machine learning practical

Advances in kernel methods
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Feature Selection for Unbalanced Class Distribution and Naive Bayes

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Feature Selection via Set Cover

KDEX '97 Proceedings of the 1997 IEEE Knowledge and Data Engineering Exchange Workshop
Consistency-based search in feature selection

Artificial Intelligence
Improving performance of text categorization by combining filtering and support vector machines: Research Articles

Journal of the American Society for Information Science and Technology
Scoring and Selecting Terms for Text Categorization

IEEE Intelligent Systems
Introducing a Family of Linear Measures for Feature Selection in Text Categorization

IEEE Transactions on Knowledge and Data Engineering
A support vector method for multivariate performance measures

ICML '05 Proceedings of the 22nd international conference on Machine learning
The relationship between Precision-Recall and ROC curves

ICML '06 Proceedings of the 23rd international conference on Machine learning
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Using Laplace and angular measures for Feature Selection in Text Categorisation

International Journal of Advanced Intelligence Paradigms
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper the performance of the Set Cover (SC) Feature Selection (FS) method for Text Categorisation (TC) and Spam Detection problems is studied. Several variants of the original method are presented either to overcome the drawback of the unbalanced problems which are usually present in TC or to increase the efficiency. The behaviour of the algorithm is tested on several collections. The experiments show these methods provide a great reduction in the dimensionality of the problem either keeping the effectiveness of the classification or causing just a slight decrease.