Phishing detection with popular search engines: simple and effective

  • Authors:
  • Jun Ho Huh;Hyoungshick Kim

  • Affiliations:
  • Information Trust Institute, University of Illinois at Urbana-Champaign;Computer Laboratory, University of Cambridge, UK

  • Venue:
  • FPS'11 Proceedings of the 4th Canada-France MITACS conference on Foundations and Practice of Security
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a new phishing detection heuristic based on the search results returned from popular web search engines such as Google, Bing and Yahoo. The full URL of a website a user intends to access is used as the search string, and the number of results returned and ranking of the website are used for classification. Most of the time, legitimate websites get back large number of results and are ranked first, whereas phishing websites get back no result and/or are not ranked at all. To demonstrate the effectiveness of our approach, we experimented with four well-known classification algorithms --- Linear Discriminant Analysis, Naïve Bayesian, K -Nearest Neighbour, and Support Vector Machine --- and observed their performance. The K -Nearest Neighbour algorithm performed best, achieving true positive rate of 98% and false positive and false negative rates of 2%. We used new legitimate websites and phishing websites as our dataset to show that our approach works well even on newly launched websites/webpages --- such websites are often misclassified in existing blacklisting and whitelisting approaches.