Two phase approach for spam-mail filtering

  • Authors:
  • Sin-Jae Kang;Sae-Bom Lee;Jong-Wan Kim;In-Gil Nam

  • Affiliations:
  • School of Computer and Information Technology, Daegu University, Gyeonsan, Gyeongbuk, South Korea;School of Computer and Information Technology, Daegu University, Gyeonsan, Gyeongbuk, South Korea;School of Computer and Information Technology, Daegu University, Gyeonsan, Gyeongbuk, South Korea;School of Computer and Information Technology, Daegu University, Gyeonsan, Gyeongbuk, South Korea

  • Venue:
  • CIS'04 Proceedings of the First international conference on Computational and Information Science
  • Year:
  • 2004

Quantified Score

Hi-index 0.01

Visualization

Abstract

This paper describes a two-phase method for filtering spam mails based on textual information and hyperlinks. Since the body of a spam mail has little text information, it provides insufficient hints to distinguish spam mails from legitimate mails. To resolve this problem, we follows hyperlinks contained in the email body, fetches contents of a remote webpage, and extracts hints (i.e., features) from original email body and fetched webpages. We divided hints into two kinds of information: definite information and less definite textual information. In our experiment, the method of fetching web pages achieved an improvement of F-measure by 9.4% over the method of using an original email header and body only.