New filtering approaches for phishing email

Authors:
André/ Bergholz;Jan De Beer;Sebastian Glahn;Marie-Francine Moens;Gerhard Paaß/;Siehyun Strobel
Affiliations:
(Correspd. Tel.: +49 2241 14 3021/ Fax: +49 2241 14 43021/ E-mail: andre.bergholz@ iais.fraunhofer.de) Fraunhofer IAIS, Schloß/ Birlinghoven, 53754 St. Augustin, Germany;Department of Computer Science, Katholieke Universiteit Leuven, Celestijnenlaan 200A, 3001 Heverlee, Belgium;Fraunhofer IAIS, Schloß/ Birlinghoven, 53754 St. Augustin, Germany;Department of Computer Science, Katholieke Universiteit Leuven, Celestijnenlaan 200A, 3001 Heverlee, Belgium;Fraunhofer IAIS, Schloß/ Birlinghoven, 53754 St. Augustin, Germany;Fraunhofer IAIS, Schloß/ Birlinghoven, 53754 St. Augustin, Germany
Venue:
Journal of Computer Security - EU-Funded ICT Research on Trust and Security
Year:
2010

Citing 20
Cited 8

Data compression using dynamic Markov modelling

The Computer Journal
The nature of statistical learning theory

The nature of statistical learning theory
Enhancements to the data mining process

Enhancements to the data mining process
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Data clustering: a review

ACM Computing Surveys (CSUR)
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Random Forests

Machine Learning
Latent dirichlet allocation

The Journal of Machine Learning Research
Why phishing works

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Large-scale text categorization by batch mode active learning

Proceedings of the 15th international conference on World Wide Web
Spam and the ongoing battle for the inbox

Communications of the ACM - Spam and the ongoing battle for the inbox
Cantina: a content-based approach to detecting phishing web sites

Proceedings of the 16th international conference on World Wide Web
Learning to detect phishing emails

Proceedings of the 16th international conference on World Wide Web
Spam Filtering Using Statistical Data Compression Models

The Journal of Machine Learning Research
Spam Filtering Based On The Analysis Of Text Information Embedded Into Images

The Journal of Machine Learning Research
A comparison of machine learning techniques for phishing detection

Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit
Automatic Document Logo Detection

ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
Image Spam Filtering Using Visual Information

ICIAP '07 Proceedings of the 14th International Conference on Image Analysis and Processing
Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters

IEEE Transactions on Computers
On compression-based text classification

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research

AntiPhish: lessons learnt

Proceedings of the ACM SIGKDD Workshop on CyberSecurity and Intelligence Informatics
Online phishing classification using adversarial data mining and signaling games

Proceedings of the ACM SIGKDD Workshop on CyberSecurity and Intelligence Informatics
Online phishing classification using adversarial data mining and signaling games

ACM SIGKDD Explorations Newsletter
Enhanced email spam filtering through combining similarity graphs

Proceedings of the fourth ACM international conference on Web search and data mining
A study of feature subset evaluators and feature subset searching methods for phishing classification

Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference
Impact of spam exposure on user engagement

Security'12 Proceedings of the 21st USENIX conference on Security symposium
A multi-tier phishing detection and filtering approach

Journal of Network and Computer Applications
Future trends in business analytics and optimization

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Phishing emails usually contain a message from a credible looking source requesting a user to click a link to a website where she/he is asked to enter a password or other confidential information. Most phishing emails aim at withdrawing money from financial institutions or getting access to private information. Phishing has increased enormously over the last years and is a serious threat to global security and economy. There are a number of possible countermeasures to phishing. These range from communication-oriented approaches like authentication protocols over blacklisting to content-based filtering approaches. We argue that the first two approaches are currently not broadly implemented or exhibit deficits. Therefore content-based phishing filters are necessary and widely used to increase communication security. A number of features are extracted capturing the content and structural properties of the email. Subsequently a statistical classifier is trained using these features on a training set of emails labeled as ham (legitimate), spam or phishing. This classifier may then be applied to an email stream to estimate the classes of new incoming emails. In this paper we describe a number of novel features that are particularly well-suited to identify phishing emails. These include statistical models for the low-dimensional descriptions of email topics, sequential analysis of email text and external links, the detection of embedded logos as well as indicators for hidden salting. Hidden salting is the intentional addition or distortion of content not perceivable by the reader. For empirical evaluation we have obtained a large realistic corpus of emails prelabeled as spam, phishing, and ham (legitimate). In experiments our methods outperform other published approaches for classifying phishing emails. We discuss the implications of these results for the practical application of this approach in the workflow of an email provider. Finally we describe a strategy how the filters may be updated and adapted to new types of phishing.