Detecting image spam using visual features and near duplicate detection

Authors:
Bhaskar Mehta;Saurabh Nangia;Manish Gupta;Wolfgang Nejdl
Affiliations:
Google Inc., Zurich, Switzerland;IIT Guwahati, Guwahati, India;IIT Guwahati, Guwahati, India;Forschungszentrum L3S, Hannover, Germany
Venue:
Proceedings of the 17th international conference on World Wide Web
Year:
2008

Citing 7
Cited 14

Texture analysis

Handbook of pattern recognition & computer vision
Making large-scale support vector machine learning practical

Advances in kernel methods
Markets for attention: will postage for email help?

CSCW '02 Proceedings of the 2002 ACM conference on Computer supported cooperative work
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Unsupervised Image Clustering Using the Information Bottleneck Method

Proceedings of the 24th DAGM Symposium on Pattern Recognition
An Efficient Image Similarity Measure Based on Approximations of KL-Divergence Between Two Gaussian Mixtures

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Spam Filtering Based On The Analysis Of Text Information Embedded Into Images

The Journal of Machine Learning Research

Evaluation of spam detection and prevention frameworks for email and image spam: a state of art

Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services
Detecting image spam using local invariant features and pyramid match kernel

Proceedings of the 18th international conference on World wide web
Revealing common sources of image spam by unsupervised clustering with visual features

Proceedings of the 2009 ACM symposium on Applied Computing
Semi Supervised Image Spam Hunter: A Regularized Discriminant EM Approach

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Image spam clustering: an unsupervised approach

MiFor '09 Proceedings of the First ACM workshop on Multimedia in forensics
Language-model-based detection cascade for efficient classification of image-based spam e-mail

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
A comprehensive approach to image spam detection: from server to client solution

IEEE Transactions on Information Forensics and Security
Identifying and resolving hidden text salting

IEEE Transactions on Information Forensics and Security
A case for query by image and text content: searching computer help using screenshots and keywords

Proceedings of the 20th international conference on World wide web
Spam detection in online classified advertisements

Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality
A survey and experimental evaluation of image spam filtering techniques

Pattern Recognition Letters
Detecting near-duplicate SPITs in voice mailboxes using hashes

ISC'11 Proceedings of the 14th international conference on Information security
BASIL: effective near-duplicate image detection using gene sequence alignment

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
On online high-dimensional spherical data clustering and feature selection

Engineering Applications of Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Email spam is a much studied topic, but even though current email spam detecting software has been gaining a competitive edge against text based email spam, new advances in spam generation have posed a new challenge: image-based spam. Image based spam is email which includes embedded images containing the spam messages, but in binary format. In this paper, we study the characteristics of image spam to propose two solutions for detecting image-based spam, while drawing a comparison with the existing techniques. The first solution, which uses the visual features for classification, offers an accuracy of about 98%, i.e. an improvement of at least 6% compared to existing solutions. SVMs (Support Vector Machines) are used to train classifiers using judiciously decided color, texture and shape features. The second solution offers a novel approach for near duplication detection in images. It involves clustering of image GMMs (Gaussian Mixture Models) based on the Agglomerative Information Bottleneck (AIB) principle, using Jensen-Shannon divergence (JS) as the distance measure.