Combining neural networks and semantic feature space for email classification

  • Authors:
  • Bo Yu;Dong-hua Zhu

  • Affiliations:
  • Lab of Knowledge Management and Data Analysis, School of Management and Economics, Beijing Institute of Technology, Beijing 100081, PR China;Lab of Knowledge Management and Data Analysis, School of Management and Economics, Beijing Institute of Technology, Beijing 100081, PR China

  • Venue:
  • Knowledge-Based Systems
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Email is one of the most ubiquitous and pervasive applications used on a daily basis by millions of people worldwide, individuals and organizations more and more rely on the emails to communicate and share information and knowledge. However, the increase in email users has resulted in a dramatic increase in spam emails during the past few years. It is becoming a big challenge to process and manage the emails efficiently for and individuals and organizations. This paper proposes new email classification models using a linear neural network trained by perceptron learning algorithm and a nonlinear neural network trained by back-propagation learning algorithm. An efficient semantic feature space (SFS) method is introduced in these classification models. The traditional back-propagation neural network (BPNN) has slow learning speed and is prone to trap into a local minimum, so the modified back-propagation neural network (MBPNN) is presented to overcome these limitations. The vector space model based email classification system suffers from a large number of features and ambiguity in the meaning of terms, which will lead to sparse and noisy feature space. So we use the SFS to convert the original sparse and noisy feature space to a semantically richer feature space, which will helps to accelerate the learning speed. The experiments are conducted based on different training set size and extracted feature size. Experimental results show that the models using MBPNN outperform the traditional BPNN, and the use of SFS can greatly reduce the feature dimensionality and improve email classification performance.