A practical part-of-speech tagger
ANLC '92 Proceedings of the third conference on Applied natural language processing
Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification
Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification
Image Analysis for Efficient Categorization of Image-based Spam E-mail
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Exploiting machine learning to subvert your spam filter
LEET'08 Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats
Measurement and classification of humans and bots in internet chat
SS'08 Proceedings of the 17th conference on Security symposium
All your contacts are belong to us: automated identity theft attacks on social networks
Proceedings of the 18th international conference on World wide web
Comment spam injection made easy
CCNC'09 Proceedings of the 6th IEEE Conference on Consumer Communications and Networking Conference
Removing web spam links from search engine results
Journal in Computer Virology
Enhanced Topic-based Vector Space Model for semantics-aware spam filtering
Expert Systems with Applications: An International Journal
Humans and bots in internet chat: measurement, analysis, and automated classification
IEEE/ACM Transactions on Networking (TON)
Word sense disambiguation for spam filtering
Electronic Commerce Research and Applications
Hi-index | 0.00 |
Today's attacks against Bayesian spam filters attempt to keep the content of spam mails visible to humans, but obscured to filters. A common technique is to fool filters by appending additional words to a spam mail. Because these words appear very rarely in spam mails, filters are inclined to classify the mail as legitimate. The idea we present in this paper leverages the fact that natural language typically contains synonyms. Synonyms are different words that describe similar terms and concepts. Such words often have significantly different spam probabilities. Thus, an attacker might be able to penetrate Bayesian filters by replacing suspicious words by innocuous terms with the same meaning. A precondition for the success of such an attack is that Bayesian spam filters of different users assign similar spam probabilities to similar tokens. We first examine whether this precondition is met; afterwards, we measure the effectivity of an automated substitution attack by creating a test set of spam messages that are tested against SpamAssassin, DSPAM, and Gmail.