Enhanced Topic-based Vector Space Model for semantics-aware spam filtering

  • Authors:
  • Igor Santos;Carlos Laorden;Borja Sanz;Pablo G. Bringas

  • Affiliations:
  • Laboratory for Smartness, Semantics and Security (S3Lab), University of Deusto, Avenida de las Universidades 24, 48007 Bilbao, Spain;Laboratory for Smartness, Semantics and Security (S3Lab), University of Deusto, Avenida de las Universidades 24, 48007 Bilbao, Spain;Laboratory for Smartness, Semantics and Security (S3Lab), University of Deusto, Avenida de las Universidades 24, 48007 Bilbao, Spain;Laboratory for Smartness, Semantics and Security (S3Lab), University of Deusto, Avenida de las Universidades 24, 48007 Bilbao, Spain

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2012

Quantified Score

Hi-index 12.05

Visualization

Abstract

Spam has become a major issue in computer security because it is a channel for threats such as computer viruses, worms and phishing. More than 85% of received e-mails are spam. Historical approaches to combat these messages including simple techniques such as sender blacklisting or the use of e-mail signatures, are no longer completely reliable. Currently, many solutions feature machine-learning algorithms trained using statistical representations of the terms that usually appear in the e-mails. Still, these methods are merely syntactic and are unable to account for the underlying semantics of terms within the messages. In this paper, we explore the use of semantics in spam filtering by representing e-mails with a recently introduced Information Retrieval model: the enhanced Topic-based Vector Space Model (eTVSM). This model is capable of representing linguistic phenomena using a semantic ontology. Based upon this representation, we apply several well-known machine-learning models and show that the proposed method can detect the internal semantics of spam messages.