Using word similarity to eradicate junk emails

  • Authors:
  • Maria S. Pera;Yiu-Kai Ng

  • Affiliations:
  • Brigham Young University, Provo, UT;Brigham Young University, Provo, UT

  • Venue:
  • Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Emails are one of the most commonly used modern communication media these days; however, unsolicited emails obstruct this otherwise fast and convenient technology for information exchange and jeopardize the continuity of this popular communication tool. Waste of valuable resources and time and exposure to offensive content are only a few of the problems that arise as a result of junk emails. In addition, the monetary cost of processing junk emails reaches billions of dollars per year and is absorbed by public users and Internet service providers. Even though there has been extensive work in the past dedicated to eradicate junk emails, none of the existing junk email detection approaches has been highly successful in solving these problems, since spammers have been able to infiltrate existing detection techniques. In this paper, we present a new tool, JunEX, which relies on the content similarity of emails to eradicate junk emails. JunEX compares each incoming email to a core of emails marked as junk by each individual user to identify unwanted emails while reducing the number of legitimate emails treated as junk, which is critical. Conducted experiments on JunEX verify its high accuracy.