Spam Detection Using Text Clustering

  • Authors:
  • Minoru Sasaki;Hiroyuki Shinnou

  • Affiliations:
  • Ibaraki University, Japan;Ibaraki University, Japan

  • Venue:
  • CW '05 Proceedings of the 2005 International Conference on Cyberworlds
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a new spam detection technique using the text clustering based on vector space model. Our method computes disjoint clusters automatically using a spherical k-means algorithm for all spam/non-spam mails and obtains centroid vectors of the clusters for extracting the cluster description. For each centroid vectors, the label('spam' or 'non-spam') is assigned by calculating the number of spam email in the cluster.When new mail arrives, the cosine similarity between the new mail vector and centroid vector is calculated. Finally, the label of the most relevant cluster is assigned to the new mail. By using our method, we can extract many kinds of topics in spam/non-spam email and detect the spam email efficiently. In this paper, we describe the our spam detection system and show the result of our experiments using the Ling-Spam test collection.