Density-based spam detector

  • Authors:
  • Kenichi YOSHIDA;Fuminori ADACHI;Takashi WASHIO;Hiroshi MOTODA;Teruaki HOMMA;Akihiro NAKASHIMA;Hiromitsu FUJIKAWA;Katsuyuki YAMAZAKI

  • Affiliations:
  • University of Tsukuba, Tokyo, Japan;Osaka University, Osaka, Japan;Osaka University, Osaka, Japan;Osaka University, Osaka, Japan;KDDI Corporation, Tokyo, Japan;KDDI Corporation, Tokyo, Japan;KDDI R&D Laboratories Inc., Saitama, Japan;KDDI R&D Laboratories Inc., Saitama, Japan

  • Venue:
  • Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The volume of mass unsolicited electronic mail, often known as spam, has recently increased enormously and has become a serious threat to not only the Internet but also to society. This paper proposes a new spam detection method which uses document space density information. Although it requires extensive e-mail traffic to acquire the necessary information, an unsupervised learning engine with a short white list can achieve a 98% recall rate and 100% precision. A direct-mapped cache method contributes handling of over 13,000 e-mails per second. Experimental results, which were conducted using over 50 million actual e-mails of traffic, are also reported in this paper.