Spam filtering using semantic similarity approach and adaptive BPNN

  • Authors:
  • Cheng Hua Li;Jimmy Xiangji Huang

  • Affiliations:
  • School of Information Technology, York University, Toronto, Ontario, Canada M3J 1P3;School of Information Technology, York University, Toronto, Ontario, Canada M3J 1P3

  • Venue:
  • Neurocomputing
  • Year:
  • 2012

Quantified Score

Hi-index 0.01

Visualization

Abstract

This paper proposes a novel approach for spam filtering based on various semantic similarity measures and an adaptive back propagation neural network (ABPNN). Semantic similarity approach is a promising avenue that addresses the problems for keyword based spam filtering model. In this paper, we propose a new method that integrates three kinds of semantic similarity approaches for spam filtering as a case study of data mining application. First, to construct a latent semantic feature space from training data with a statistical method. Second, to build a corpus based thesaurus by extracting the relationship between words based on its co-occurrence in the documents. Third, to combine the latent semantic feature space with the corpus based thesaurus. Back propagation neural network is one of the efficient approaches for classification. However, the traditional BPNN has the problems of slow learning and easy to trap into a local minimum. In this paper, we adopt an adaptive algorithm to improve the traditional BPNN that can overcome these problems. To investigate the effectiveness of our methods, we conduct extensive experiments on ling-spam, PU1 and PU3 data sets. Experimental results show that the proposed system is able to achieve higher performance, especially for the combination of the hybrid semantic similarity approach and the adaptive back propagation neural network.