Fighting unicode-obfuscated spam

Authors:
Changwei Liu;Sid Stamm
Affiliations:
Indiana University;Indiana University
Venue:
Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit
Year:
2007

Citing 12
Cited 4

Social information filtering: algorithms for automating “word of mouth”

CHI '95 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
How to avoid unwanted email

Communications of the ACM
Spam!

Communications of the ACM
The homograph attack

Communications of the ACM - Ontology: different ways of representing the same concept
Curbing Junk E-Mail via Secure Classification

FC '98 Proceedings of the Second International Conference on Financial Cryptography
Johnny 2: a user test of key continuity management with S/MIME and Outlook Express

SOUPS '05 Proceedings of the 2005 symposium on Usable privacy and security
Safeguard against unicode attacks: generation and applications of UC-simlist

Proceedings of the 15th international conference on World Wide Web
The methodology and an application to fight against Unicode attacks

SOUPS '06 Proceedings of the second symposium on Usable privacy and security
Catching spam before it arrives: domain specific dynamic blacklists

ACSW Frontiers '06 Proceedings of the 2006 Australasian workshops on Grid computing and e-research - Volume 54
Phishing and Countermeasures: Understanding the Increasing Problem of Electronic Identity Theft

Phishing and Countermeasures: Understanding the Increasing Problem of Electronic Identity Theft
Spam and the ongoing battle for the inbox

Communications of the ACM - Spam and the ongoing battle for the inbox
Modeling and preventing phishing attacks

FC'05 Proceedings of the 9th international conference on Financial Cryptography and Data Security

Delayed password disclosure

ACM SIGACT News
Adversarial machine learning

Proceedings of the 4th ACM workshop on Security and artificial intelligence
Hybrid feature selection for phishing email detection

ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part II
Rough sets for spam filtering: Selecting appropriate decision rules for boundary e-mail classification

Applied Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the last few years, obfuscation has been used more and more by spammers to make spam emails bypass filters. The standard method is to use images that look like text, since typical spam filters are unable to parse such messages; this is what is used in so-called "rock phishing". To fight image-based spam, many spam filters use heuristic rules in which emails containing images are flagged, and since not many legit emails are composed mainly of a big image, this aids in detecting image-based spam. The spammers are thus interested in circumventing these methods. Unicode transliteration is a convenient tool for spammers, since it allows a spammer to create a large number of homomorphic clones of the same looking message; since Unicode contains many characters that are unique but appear very similar, spammers can translate a message's characters at random to hide black-listed words in an effort to bypass filters. In order to defend against these unicode-obfuscated spam emails, we developed a prototype tool that can be used with Spam Assassin to block spam obfuscated in this way by mapping polymorphic messages to a common, more homogeneous representation. This representation can then be filtered using traditional methods. We demonstrate the ease with which Unicode polymorphism can be used to circumvent spam filters such as SpamAssassin, and then describe a de-obfuscation technique that can be used to catch messages that have been obfuscated in this fashion.