Data compression using dynamic Markov modelling
The Computer Journal
An estimate of an upper bound for the entropy of English
Computational Linguistics
The design and analysis of efficient lossless data compression systems
The design and analysis of efficient lossless data compression systems
Making large-scale support vector machine learning practical
Advances in kernel methods
Evaluating cost-sensitive Unsolicited Bulk Email categorization
Proceedings of the 2002 ACM symposium on Applied computing
A statistical approach to the spam problem
Linux Journal
A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists
Information Retrieval
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Text Categorization Using Compression Models
DCC '00 Proceedings of the Conference on Data Compression
Augmenting Naive Bayes Classifiers with Statistical Language Models
Information Retrieval
Towards parameter-free data mining
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A comparison of event models for Naive Bayes anti-spam e-mail filtering
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Compression and Machine Learning: A New Perspective on Feature Space Vectors
DCC '06 Proceedings of the Data Compression Conference
Exploiting structural information for semi-structured document categorization
Information Processing and Management: an International Journal
A suffix tree approach to anti-spam email filtering
Machine Learning
Hackers & Painters: Big Ideas from the Computer Age
Hackers & Painters: Big Ideas from the Computer Age
Fisher information and stochastic complexity
IEEE Transactions on Information Theory
The minimum description length principle in coding and modeling
IEEE Transactions on Information Theory
The context-tree weighting method: basic properties
IEEE Transactions on Information Theory
Spam and the ongoing battle for the inbox
Communications of the ACM - Spam and the ongoing battle for the inbox
Machine Learning for Computer Security
The Journal of Machine Learning Research
Spam filtering for short messages
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Artificial Intelligence Review
Detecting spam email by radial basis function networks
International Journal of Knowledge-based and Intelligent Engineering Systems
Semi-supervised spam filtering: does it work?
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Asymmetric support vector machines: low false-positive learning under the user tolerance
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Catching the Drift: Using Feature-Free Case-Based Reasoning for Spam Filtering
ICCBR '07 Proceedings of the 7th international conference on Case-Based Reasoning: Case-Based Reasoning Research and Development
Anticipating Hidden Text Salting in Emails
RAID '08 Proceedings of the 11th international symposium on Recent Advances in Intrusion Detection
Latent dirichlet allocation in web spam filtering
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Email Spam Filtering: A Systematic Review
Foundations and Trends in Information Retrieval
Malware detection using adaptive data compression
Proceedings of the 1st ACM workshop on Workshop on AISec
Unsupervised Spam Detection by Document Complexity Estimation
DS '08 Proceedings of the 11th International Conference on Discovery Science
Linked latent Dirichlet allocation in web spam filtering
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Review: A review of machine learning approaches to Spam filtering
Expert Systems with Applications: An International Journal
Genre-based decomposition of email class noise
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Spam filter evaluation with imprecise ground truth
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Using dynamic markov compression to detect vandalism in the wikipedia
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
A survey of learning-based techniques of email spam filtering
Artificial Intelligence Review
Study on Ensemble Classification Methods towards Spam Filtering
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
An effective and robust method for short text classification
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
New filtering approaches for phishing email
Journal of Computer Security - EU-Funded ICT Research on Trust and Security
Filtering spams using the minimum description length principle
Proceedings of the 2010 ACM Symposium on Applied Computing
Uncovering social spammers: social honeypots + machine learning
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Detecting algorithmically generated malicious domain names
IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
Using biased discriminant analysis for email filtering
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part I
Identifying and resolving hidden text salting
IEEE Transactions on Information Forensics and Security
Enhanced email spam filtering through combining similarity graphs
Proceedings of the fourth ACM international conference on Web search and data mining
Compression for anti-adversarial learning
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Enhancing scalability in anomaly-based email spam filtering
Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference
Differentiating code from data in x86 binaries
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Enhanced Topic-based Vector Space Model for semantics-aware spam filtering
Expert Systems with Applications: An International Journal
PCA document reconstruction for email classification
Computational Statistics & Data Analysis
Tweet classification by data compression
Proceedings of the 2011 international workshop on DETecting and Exploiting Cultural diversiTy on the social web
Text mining and probabilistic language modeling for online review spam detection
ACM Transactions on Management Information Systems (TMIS)
Facing the spammers: A very effective approach to avoid junk e-mails
Expert Systems with Applications: An International Journal
Segmental parameterisation and statistical modelling of e-mail headers for spam detection
Information Sciences: an International Journal
Modeling sequences of user actions for statistical goal recognition
User Modeling and User-Adapted Interaction
Word sense disambiguation for spam filtering
Electronic Commerce Research and Applications
Evasion attack of multi-class linear classifiers
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Impact of spam exposure on user engagement
Security'12 Proceedings of the 21st USENIX conference on Security symposium
Robust detection of comment spam using entropy rate
Proceedings of the 5th ACM workshop on Security and artificial intelligence
Language identification for creating language-specific Twitter collections
LSM '12 Proceedings of the Second Workshop on Language in Social Media
Detecting algorithmically generated domain-flux attacks with DNS traffic analysis
IEEE/ACM Transactions on Networking (TON)
A Self-Supervised Approach to Comment Spam Detection Based on Content Analysis
International Journal of Information Security and Privacy
Cross-lingual web spam classification
Proceedings of the 22nd international conference on World Wide Web companion
Reversing the effects of tokenisation attacks against content-based spam filters
International Journal of Security and Networks
Dictionary-based color image retrieval using multiset theory
Journal of Visual Communication and Image Representation
Campaign extraction from social media
ACM Transactions on Intelligent Systems and Technology (TIST) - Special Section on Intelligent Mobile Knowledge Discovery and Management Systems and Special Issue on Social Web Mining
Hi-index | 0.00 |
Spam filtering poses a special problem in text categorization, of which the defining characteristic is that filters face an active adversary, which constantly attempts to evade filtering. Since spam evolves continuously and most practical applications are based on online user feedback, the task calls for fast, incremental and robust learning algorithms. In this paper, we investigate a novel approach to spam filtering based on adaptive statistical data compression models. The nature of these models allows them to be employed as probabilistic text classifiers based on character-level or binary sequences. By modeling messages as sequences, tokenization and other error-prone preprocessing steps are omitted altogether, resulting in a method that is very robust. The models are also fast to construct and incrementally updateable. We evaluate the filtering performance of two different compression algorithms; dynamic Markov compression and prediction by partial matching. The results of our empirical evaluation indicate that compression models outperform currently established spam filters, as well as a number of methods proposed in previous studies.