Identifying features to improve real time clustering and domain blacklisting

  • Authors:
  • Soma Halder;Richa Tiwari;Alan Sprague

  • Affiliations:
  • Univ. of Alabama at Birmingham, Birmingham, Alabama;Univ. of Alabama at Birmingham, Birmingham, Alabama;Univ. of Alabama at Birmingham, Birmingham, Alabama

  • Venue:
  • Proceedings of the 50th Annual Southeast Regional Conference
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Feature analysis is an important task in the area of information extraction. Appropriate features give improved performance for any classification or clustering algorithm. In this paper we try to analyze different features that can be used to cluster spam emails at real time and thus improve IP blacklisting. Domain blacklisting becomes easy when these features are used because masses of IP address get grouped easily. We have explored several features in this paper like sender and subject of the email; email attachments, stylistic and semantic features. These features ensure appropriate clustering of spam originating from dominant hosts. We compute the effectiveness of these features in terms of how well they group emails, gather domain/IP information and thus improve domain blacklisting.