Clustering potential phishing websites using DeepMD5

  • Authors:
  • Jason Britt;Brad Wardman;Alan Sprague;Gary Warner

  • Affiliations:
  • Department of Computer & Inf. Sciences, University of Alabama at Birmingham, Birmingham, AL;Department of Computer & Inf. Sciences, University of Alabama at Birmingham, Birmingham, AL;Department of Computer & Inf. Sciences, University of Alabama at Birmingham, Birmingham, AL;Department of Computer & Inf. Sciences, University of Alabama at Birmingham, Birmingham, AL

  • Venue:
  • LEET'12 Proceedings of the 5th USENIX conference on Large-Scale Exploits and Emergent Threats
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Phishing websites attempt to deceive people to expose their passwords, user IDs and other sensitive information by mimicking legitimate websites such as banks, product vendors, and service providers. Phishing websites are a pervasive and ongoing problem. Examining and analyzing a phishing website is a good first step in an investigation. Examining and analyzing phishing websites can be a manually intensive job and analyzing a large continuous feed of phishing websites manually would be an almost insurmountable problem because of the amount of time and labor required. Automated methods need to be created that group large volumes of phishing website data and allow investigators to focus their investigative efforts on the largest phishing website groupings that represent the most prevalent phishing groups or individuals. An attempt to create such an automated method is described in this paper. The method is based upon the assumption that phishing websites attacking a particular brand are often used many times by a particular group or individual. And when the targeted brand changes a new phishing website is not created from scratch, but rather incremental upgrades are made to the original phishing website. The method employs a SLINK-style clustering algorithm using local domain file commonality between websites as a distance metric. This method produces clusters of phishing websites with the same brand and evidence suggests created by the same phishing group or individual.