A hierarchical adaptive probabilistic approach for zero hour phish detection

Authors:
Guang Xiang;Bryan A. Pendleton;Jason Hong;Carolyn P. Rose
Affiliations:
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA;School of Computer Science, Carnegie Mellon University, Pittsburgh, PA;School of Computer Science, Carnegie Mellon University, Pittsburgh, PA;School of Computer Science, Carnegie Mellon University, Pittsburgh, PA
Venue:
ESORICS'10 Proceedings of the 15th European conference on Research in computer security
Year:
2010

Citing 13
Cited 3

Syntactic clustering of the Web

Selected papers from the sixth international conference on World Wide Web
On the Evolution of Clusters of Near-Duplicate Web Pages

LA-WEB '03 Proceedings of the First Conference on Latin American Web Congress
Anomaly Based Web Phishing Page Detection

ACSAC '06 Proceedings of the 22nd Annual Computer Security Applications Conference
Random web crawls

Proceedings of the 16th international conference on World Wide Web
Cantina: a content-based approach to detecting phishing web sites

Proceedings of the 16th international conference on World Wide Web
Combinatorial algorithms for web search engines: three success stories

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Examining the impact of website take-down on phishing

Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit
A comparison of machine learning techniques for phishing detection

Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit
A framework for detection and measurement of phishing attacks

Proceedings of the 2007 ACM workshop on Recurring malcode
On the Effectiveness of Techniques to Detect Phishing Sites

DIMVA '07 Proceedings of the 4th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
There is no free phish: an analysis of "free" and live phishing kits

WOOT'08 Proceedings of the 2nd conference on USENIX Workshop on offensive technologies
A hybrid phish detection approach by identity discovery and keywords retrieval

Proceedings of the 18th international conference on World wide web
A profitless endeavor: phishing as tragedy of the commons

Proceedings of the 2008 workshop on New security paradigms

CANTINA+: A Feature-Rich Machine Learning Framework for Detecting Phishing Web Sites

ACM Transactions on Information and System Security (TISSEC)
The state of phishing attacks

Communications of the ACM
Smartening the crowds: computational techniques for improving human verification to fight phishing scams

Proceedings of the Seventh Symposium on Usable Privacy and Security

Quantified Score

Hi-index	0.02

Visualization

Abstract

Phishing attacks are a significant threat to users of the Internet, causing tremendous economic loss every year. In combating phish, industry relies heavily on manual verification to achieve a low false positive rate, which, however, tends to be slow in responding to the huge volume of unique phishing URLs created by toolkits. Our goal here is to combine the best aspects of human verified blacklists and heuristic-based methods, i.e., the low false positive rate of the former and the broad and fast coverage of the latter. To this end, we present the design and evaluation of a hierarchical blacklist-enhanced phish detection framework. The key insight behind our detection algorithm is to leverage existing human-verified blacklists and apply the shingling technique, a popular near-duplicate detection algorithm used by search engines, to detect phish in a probabilistic fashion with very high accuracy. To achieve an extremely low false positive rate, we use a filtering module in our layered system, harnessing the power of search engines via information retrieval techniques to correct false positives. Comprehensive experiments over a diverse spectrum of data sources show that our method achieves 0% false positive rate (FP) with a true positive rate (TP) of 67.15% using search-oriented filtering, and 0.03% FP and 73.53% TP without the filtering module. With incremental model building capability via a sliding window mechanism, our approach is able to adapt quickly to new phishing variants, and is thus more responsive to the evolving attacks.