Detecting phishing e-mails by heterogeneous classification

Authors:
M. Dolores del Castillo;Angel Iglesias;J. Ignacio Serrano
Affiliations:
Instituto de Automática Industrial, CSIC, Madrid, Spain;Instituto de Automática Industrial, CSIC, Madrid, Spain;Instituto de Automática Industrial, CSIC, Madrid, Spain
Venue:
IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Year:
2007

Citing 6
Cited 0

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Machine Learning

Machine Learning
A multistrategy approach for digital text categorization from imbalanced documents

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Learning to detect phishing emails

Proceedings of the 16th international conference on World Wide Web
Statistical recognition of noun phrases in unrestricted text

IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a system for classifying e-mails into two categories, legitimate and fraudulent. This classifier system is based on the serial application of three filters: a Bayesian filter that classifies the textual content of e-mails, a rule- based filter that classifies the non grammatical content of e-mails and, finally, a filter based on an emulator of fictitious accesses which classifies the responses from websites referenced by links contained in e-mails. This system is based on an approach that is hybrid, because it uses different classification methods, and also integrated, because it takes into account all kind of data and information contained in e-mails. This approach aims to provide an effective and efficient classification. The system first applies fast and reliable classification methods, and only when the resulting classification decision is imprecise does the system apply more complex analysis and classification methods.