On feature extraction for spam e-mail detection

  • Authors:
  • Serkan Günal;Semih Ergin;M. Bilginer Gülmezoğlu;Ö. Nezih Gerek

  • Affiliations:
  • The Department of Electrical and Electronics Engineering, Eskişehir Osmangazi University, Eskişehir, Türkiye;The Department of Electrical and Electronics Engineering, Eskişehir Osmangazi University, Eskişehir, Türkiye;The Department of Electrical and Electronics Engineering, Eskişehir Osmangazi University, Eskişehir, Türkiye;The Department of Electrical and Electronics Engineering, Anadolu University, Eskişehir, Türkiye

  • Venue:
  • MRCS'06 Proceedings of the 2006 international conference on Multimedia Content Representation, Classification and Security
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Electronic mail is an important communication method for most computer users. Spam e-mails however consume bandwidth resource, fill-up server storage and are also a waste of time to tackle.The general way to label an e-mail as spam or non-spam is to set up a finite set of discriminative features and use a classifier for the detection. In most cases, the selection of such features is empirically verified. In this paper, two different methods are proposed to select the most discriminative features among a set of reasonably arbitrary features for spam e-mail detection. The selection methods are developed using the Common Vector Approach (CVA) which is actually a subspace-based pattern classifier.Experimental results indicate that the proposed feature selection methods give considerable reduction on the number of features without affecting recognition rates.