Email Spam Filtering: A Systematic Review
Foundations and Trends in Information Retrieval
Managing email overload with an automatic nonparametric clustering system
The Journal of Supercomputing
A survey of learning-based techniques of email spam filtering
Artificial Intelligence Review
A comparative study on text clustering methods
ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Classification of textual E-mail spam using data mining techniques
Applied Computational Intelligence and Soft Computing
Hi-index | 0.00 |
We propose a new spam detection technique using the text clustering based on vector space model. Our method computes disjoint clusters automatically using a spherical k-means algorithm for all spam/non-spam mails and obtains centroid vectors of the clusters for extracting the cluster description. For each centroid vectors, the label('spam' or 'non-spam') is assigned by calculating the number of spam email in the cluster.When new mail arrives, the cosine similarity between the new mail vector and centroid vector is calculated. Finally, the label of the most relevant cluster is assigned to the new mail. By using our method, we can extract many kinds of topics in spam/non-spam email and detect the spam email efficiently. In this paper, we describe the our spam detection system and show the result of our experiments using the Ling-Spam test collection.