Word sense disambiguation for spam filtering

  • Authors:
  • Carlos Laorden;Igor Santos;Borja Sanz;Gonzalo Alvarez;Pablo G. Bringas

  • Affiliations:
  • Laboratory for Smartness, Semantics and Security (S3Lab), University of Deusto, Avenida de las Universidades 24, 48007 Bilbao, Spain;Laboratory for Smartness, Semantics and Security (S3Lab), University of Deusto, Avenida de las Universidades 24, 48007 Bilbao, Spain;Laboratory for Smartness, Semantics and Security (S3Lab), University of Deusto, Avenida de las Universidades 24, 48007 Bilbao, Spain;Instituto de Física Aplicada, Consejo Superior de Investigaciones Científicas (CSIC), C/Serrano 144, 28006 Madrid, Spain;Laboratory for Smartness, Semantics and Security (S3Lab), University of Deusto, Avenida de las Universidades 24, 48007 Bilbao, Spain

  • Venue:
  • Electronic Commerce Research and Applications
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Spam has become a major issue in computer security because it is a channel for threats such as computer viruses, worms, and phishing. More than 86% of received e-mails are spam. Historical approaches to combating these messages, including simple techniques such as sender blacklisting or the use of e-mail signatures, are no longer completely reliable. Many current solutions feature machine-learning algorithms trained using statistical representations of the terms that most commonly appear in such e-mails. However, these methods are merely syntactic and are unable to account for the underlying semantics of terms within messages. In this paper, we explore the use of semantics in spam filtering by introducing a pre-processing step of Word Sense Disambiguation (WSD). Based upon this disambiguated representation, we apply several well-known machine-learning models and show that the proposed method can detect the internal semantics of spam messages.