Applying clustering and ensemble clustering approaches to phishing profiling

Authors:
John Yearwood;Dean Webb;Liping Ma;Peter Vamplew;Bahadorreza Ofoghi;Andrei Kelarev
Affiliations:
University of Ballarat, Ballarat, Australia;University of Ballarat, Ballarat, Australia;University of Ballarat, Ballarat, Australia;University of Ballarat, Ballarat, Australia;University of Ballarat, Ballarat, Australia;University of Ballarat, Ballarat, Australia
Venue:
AusDM '09 Proceedings of the Eighth Australasian Data Mining Conference - Volume 101
Year:
2009

Citing 9
Cited 3

Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms

Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Combining Multiple Weak Clusterings

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Solving cluster ensemble problems by bipartite graph partitioning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Do security toolbars actually prevent phishing attacks?

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Cache Cookies for Browser Authentication (Extended Abstract)

SP '06 Proceedings of the 2006 IEEE Symposium on Security and Privacy
Learning to detect phishing emails

Proceedings of the 16th international conference on World Wide Web
Modified global k-means algorithm for clustering in gene expression data sets

WISB '06 Proceedings of the 2006 workshop on Intelligent systems for bioinformatics - Volume 73
Modified global k-means algorithm for minimum sum-of-squares clustering problems

Pattern Recognition

Detection of CAN by ensemble classifiers based on ripple down rules

PKAW'12 Proceedings of the 12th Pacific Rim conference on Knowledge Management and Acquisition for Intelligent Systems
A multi-tier ensemble construction of classifiers for phishing email detection and filtering

CSS'12 Proceedings of the 4th international conference on Cyberspace Safety and Security
Improving classifications for cardiac autonomic neuropathy using multi-level ensemble classifiers and feature selection based on random forest

AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a novel approach to profiling phishing emails based on the combination of multiple independent clusterings of the email documents. Each clustering is motivated by a natural representation of the emails. A data set of 2048 phishing emails provided by a major Australian financial institution was pre-processed to extract features describing the textual content, hyperlinks and orthographic structure of the emails. Independent clusterings using different techniques were performed on each representation, and these clusterings were then ensembled using a variety of consensus functions. This paper concentrates on using several clustering approaches to determine the most likely number of phishing groups and explores ways in which individual and combined results relate. The approach suggests a number of phishing groups and the structure of the approach can aid the development of profiles based on the individual clusters. The actual profiling is not carried out in this paper.