Spam Filtering Based On The Analysis Of Text Information Embedded Into Images

Authors:
Giorgio Fumera;Ignazio Pillai;Fabio Roli
Affiliations:
-;-;-
Venue:
The Journal of Machine Learning Research
Year:
2006

Citing 12
Cited 20

Making large-scale support vector machine learning practical

Advances in kernel methods
An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Spam wars

Communications of the ACM - Program compaction
An evaluation of statistical spam filtering techniques

ACM Transactions on Asian Language Information Processing (TALIP)
In Defense of Spam

Computer
Noisy Text Categorization

IEEE Transactions on Pattern Analysis and Machine Intelligence
ScatterType: A Legible but Hard-to-Segment CAPTCHA

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Online supervised spam filter evaluation

ACM Transactions on Information Systems (TOIS)
Will New Standards Help Curb Spam?

Computer
Building segmentation based human-friendly human interaction proofs (HIPs)

HIP'05 Proceedings of the Second international conference on Human Interactive Proofs
Support vector machines for spam categorization

IEEE Transactions on Neural Networks

Machine Learning for Computer Security

The Journal of Machine Learning Research
Adaptive e-mail intention finding mechanism based on e-mail words social networks

Proceedings of the 2007 workshop on Large scale attack defense
Detecting image spam using visual features and near duplicate detection

Proceedings of the 17th international conference on World Wide Web
Anticipating Hidden Text Salting in Emails

RAID '08 Proceedings of the 11th international symposium on Recent Advances in Intrusion Detection
Email Spam Filtering: A Systematic Review

Foundations and Trends in Information Retrieval
Evaluation of spam detection and prevention frameworks for email and image spam: a state of art

Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services
Improved spam filtering by extraction of information from text embedded image e-mail

Proceedings of the 2009 ACM symposium on Applied Computing
Review: A review of machine learning approaches to Spam filtering

Expert Systems with Applications: An International Journal
A survey of learning-based techniques of email spam filtering

Artificial Intelligence Review
Semi Supervised Image Spam Hunter: A Regularized Discriminant EM Approach

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
New filtering approaches for phishing email

Journal of Computer Security - EU-Funded ICT Research on Trust and Security
Using SIFT for the filtering of Chinese text in image of multimedia message service

WiCOM'09 Proceedings of the 5th International Conference on Wireless communications, networking and mobile computing
An efficient method for filtering image-based spam e-mail

CAIP'07 Proceedings of the 12th international conference on Computer analysis of images and patterns
Identifying and resolving hidden text salting

IEEE Transactions on Information Forensics and Security
A survey and experimental evaluation of image spam filtering techniques

Pattern Recognition Letters
SOCIAL: self-organizing classifier ensemble for adversarial learning

MCS'10 Proceedings of the 9th international conference on Multiple Classifier Systems
Distributional lexical semantics for stop lists

IRSG'08 Proceedings of the 2008 BCS-IRSG conference on Corpus Profiling
Evasion attack of multi-class linear classifiers

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
A survey of image spamming and filtering techniques

Artificial Intelligence Review
A large-scale empirical analysis of email spam detection through network characteristics in a stand-alone enterprise

Computer Networks: The International Journal of Computer and Telecommunications Networking

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years anti-spam filters have become necessary tools for Internet service providers to face up to the continuously growing spam phenomenon. Current server-side anti-spam filters are made up of several modules aimed at detecting different features of spam e-mails. In particular, text categorisation techniques have been investigated by researchers for the design of modules for the analysis of the semantic content of e-mails, due to their potentially higher generalisation capability with respect to manually derived classification rules used in current server-side filters. However, very recently spammers introduced a new trick consisting of embedding the spam message into attached images, which can make all current techniques based on the analysis of digital text in the subject and body fields of e-mails ineffective. In this paper we propose an approach to anti-spam filtering which exploits the text information embedded into images sent as attachments. Our approach is based on the application of state-of-the-art text categorisation techniques to the analysis of text extracted by OCR tools from images attached to e-mails. The effectiveness of the proposed approach is experimentally evaluated on two large corpora of spam e-mails.