Language-model-based detection cascade for efficient classification of image-based spam e-mail

Authors:
Jen-Hao Hsia;Ming-Syan Chen
Affiliations:
Dept. of Electrical Engineering, National Taiwan University, Taipei, Taiwan;Dept. of Electrical Engineering, National Taiwan University and Institute of Information Science, Academia Sinica, Taipei, Taiwan
Venue:
ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Year:
2009

Citing 4
Cited 4

Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Detecting image spam using visual features and near duplicate detection

Proceedings of the 17th international conference on World Wide Web
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)

A survey and experimental evaluation of image spam filtering techniques

Pattern Recognition Letters
A survey of emerging approaches to spam filtering

ACM Computing Surveys (CSUR)
An ontology enhanced parallel SVM for scalable spam filter training

Neurocomputing
On online high-dimensional spherical data clustering and feature selection

Engineering Applications of Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

A new challenge in the spam email detection is the emergence of image spam, which consists in embedding the advertising messages into attached images to defeat the conventional text-based anti-spam technologies. New techniques are needed to filter these spam messages. In this paper, we proposed a prototype system to automatically classify an image directly as being spam or ham. The proposed method extracts latent topics in image to train a binary classifier for detecting spam images, and achieves more promising detection accuracy than conventional anti-spam approaches. In addition, a detection cascade is proposed to further reduce the computation overhead of the spam filter. Our algorithm is experimentally evaluated under a public spam image dataset, and shown to significantly improve both the detection accuracy and execution efficiency over the baseline approach.