Classifying e-mails via support vector machine

  • Authors:
  • Lidan Shou;Bin Cui;Gang Chen;Jinxiang Dong

  • Affiliations:
  • College of Computer Science, Zhejiang University, Hangzhou, P.R. China;School of Computing, National University of Singapore, Singapore;College of Computer Science, Zhejiang University, Hangzhou, P.R. China;College of Computer Science, Zhejiang University, Hangzhou, P.R. China

  • Venue:
  • WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

For addressing the growing problem of junk E-mail on the Internet, this paper proposes an effective E-mail classifying technique. Our work handles E-mail messages as semi-structured documents consisting of a set of fields with predefined semantics and a number of variable length free-text contents. The main contributions of this paper include the following: First, we present a Support Vector Machine (SVM) based model that incorporates the Principal Component Analysis (PCA) technique to reduce the data in terms of size and dimensionality of the input feature space. As a result, the input data become classifiable with fewer features, and the training process has faster convergence speed. Second, we build the classification model using both the $\mathcal{C}$-support vector machine and v-support vector machine algorithms. Various control parameters for performance tuning are studied in an extensive set of experiments. The results of our performance evaluation indicate that the proposed technique is effective in E-mail classification.