Data compression using dynamic Markov modelling
The Computer Journal
The nature of statistical learning theory
The nature of statistical learning theory
Enhancements to the data mining process
Enhancements to the data mining process
Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
ACM Computing Surveys (CSUR)
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Machine Learning
The Journal of Machine Learning Research
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Large-scale text categorization by batch mode active learning
Proceedings of the 15th international conference on World Wide Web
Spam and the ongoing battle for the inbox
Communications of the ACM - Spam and the ongoing battle for the inbox
Cantina: a content-based approach to detecting phishing web sites
Proceedings of the 16th international conference on World Wide Web
Learning to detect phishing emails
Proceedings of the 16th international conference on World Wide Web
Spam Filtering Using Statistical Data Compression Models
The Journal of Machine Learning Research
Spam Filtering Based On The Analysis Of Text Information Embedded Into Images
The Journal of Machine Learning Research
A comparison of machine learning techniques for phishing detection
Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit
Automatic Document Logo Detection
ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
Image Spam Filtering Using Visual Information
ICIAP '07 Proceedings of the 14th International Conference on Image Analysis and Processing
Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters
IEEE Transactions on Computers
On compression-based text classification
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Proceedings of the ACM SIGKDD Workshop on CyberSecurity and Intelligence Informatics
Online phishing classification using adversarial data mining and signaling games
Proceedings of the ACM SIGKDD Workshop on CyberSecurity and Intelligence Informatics
Online phishing classification using adversarial data mining and signaling games
ACM SIGKDD Explorations Newsletter
Enhanced email spam filtering through combining similarity graphs
Proceedings of the fourth ACM international conference on Web search and data mining
Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference
Impact of spam exposure on user engagement
Security'12 Proceedings of the 21st USENIX conference on Security symposium
A multi-tier phishing detection and filtering approach
Journal of Network and Computer Applications
Future trends in business analytics and optimization
Intelligent Data Analysis
Hi-index | 0.00 |
Phishing emails usually contain a message from a credible looking source requesting a user to click a link to a website where she/he is asked to enter a password or other confidential information. Most phishing emails aim at withdrawing money from financial institutions or getting access to private information. Phishing has increased enormously over the last years and is a serious threat to global security and economy. There are a number of possible countermeasures to phishing. These range from communication-oriented approaches like authentication protocols over blacklisting to content-based filtering approaches. We argue that the first two approaches are currently not broadly implemented or exhibit deficits. Therefore content-based phishing filters are necessary and widely used to increase communication security. A number of features are extracted capturing the content and structural properties of the email. Subsequently a statistical classifier is trained using these features on a training set of emails labeled as ham (legitimate), spam or phishing. This classifier may then be applied to an email stream to estimate the classes of new incoming emails. In this paper we describe a number of novel features that are particularly well-suited to identify phishing emails. These include statistical models for the low-dimensional descriptions of email topics, sequential analysis of email text and external links, the detection of embedded logos as well as indicators for hidden salting. Hidden salting is the intentional addition or distortion of content not perceivable by the reader. For empirical evaluation we have obtained a large realistic corpus of emails prelabeled as spam, phishing, and ham (legitimate). In experiments our methods outperform other published approaches for classifying phishing emails. We discuss the implications of these results for the practical application of this approach in the workflow of an email provider. Finally we describe a strategy how the filters may be updated and adapted to new types of phishing.