Set Cover Feature Selection for Text Categorisation and spam detection

  • Authors:
  • Elias F. Combarro;Jose Ranilla;Manuel Roberto Berdasco;Elena Montanes;Irene Diaz

  • Affiliations:
  • Computer Science Department, University of Oviedo, Spain.;Computer Science Department, University of Oviedo, Spain.;Computer Science Department, University of Oviedo, Spain.;Computer Science Department, University of Oviedo, Spain.;Computer Science Department, University of Oviedo, Spain

  • Venue:
  • International Journal of Advanced Intelligence Paradigms
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

In this paper the performance of the Set Cover (SC) Feature Selection (FS) method for Text Categorisation (TC) and Spam Detection problems is studied. Several variants of the original method are presented either to overcome the drawback of the unbalanced problems which are usually present in TC or to increase the efficiency. The behaviour of the algorithm is tested on several collections. The experiments show these methods provide a great reduction in the dimensionality of the problem either keeping the effectiveness of the classification or causing just a slight decrease.