Characterizing a spam traffic

  • Authors:
  • Luiz Henrique Gomes;Cristiano Cazita;Jussara M. Almeida;Virgílio Almeida;Wagner Meira, Jr.

  • Affiliations:
  • Federal University of Minas Gerais, Belo Horizonte - Brazil;Federal University of Minas Gerais, Belo Horizonte - Brazil;Federal University of Minas Gerais, Belo Horizonte - Brazil;Federal University of Minas Gerais, Belo Horizonte - Brazil;Federal University of Minas Gerais, Belo Horizonte - Brazil

  • Venue:
  • Proceedings of the 4th ACM SIGCOMM conference on Internet measurement
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The rapid increase in the volume of unsolicited commercial e-mails, also known as spam, is beginning to take its toll in system administrators, business corporations and end-users. Widely varying estimates of the cost associated with spam are available in the literature. However, a quantitative analysis of the determinant characteristics of spam traffic is still an open problem. This work fills this gap and presents what we believe to be the first extensive characterization of a spam traffic. As basis for our characterization, standard spam detection techniques are used to classify over 360 thousand incoming e-mails to a large university into two categories, namely spam and non-spam. For each of the two resulting workloads, as well as for the aggregate workload, we analyze a set of parameters, aiming at identifying the characteristics that significantly distinguish spam from non-spam traffic, assessing the qualitative impact of spam on the aggregate traffic and, possibly, drawing insights into the design of more effective spam detection techniques. Our characterization reveals significant differences in the spam and non-spam traffic patterns. E-mail arrival process, size distribution as well as the distributions of popularity and temporal locality of e-mail recipients are key workload aspects which distinguish spam from traditional e-mail traffic. We conjecture that these differences are consequence of the inherently different mode of operation of spam and non-spam senders. Whereas non-spam e-mail transmissions are typically driven by social bilateral relationships, spam transmission is usually a unilateral action, based solely on the senders's will to reach as many users as possible.