Making large-scale support vector machine learning practical
Advances in kernel methods
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
A statistical approach to the spam problem
Linux Journal
Improving Short-Text Classification using Unlabeled Data for Classification Problems
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Combining winnow and orthogonal sparse bigrams for incremental spam filtering
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Compression and Machine Learning: A New Perspective on Feature Space Vectors
DCC '06 Proceedings of the Data Compression Conference
Exploiting structural information for semi-structured document categorization
Information Processing and Management: an International Journal
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Content based SMS spam filtering
Proceedings of the 2006 ACM symposium on Document engineering
Spam and the ongoing battle for the inbox
Communications of the ACM - Spam and the ongoing battle for the inbox
Online supervised spam filter evaluation
ACM Transactions on Information Systems (TOIS)
Spam Filtering Using Statistical Data Compression Models
The Journal of Machine Learning Research
Relaxed online SVMs for spam filtering
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Feature engineering for mobile (SMS) spam filtering
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Email Spam Filtering: A Systematic Review
Foundations and Trends in Information Retrieval
The contribution of stylistic information to content-based mobile spam filtering
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Detecting comment spam through content analysis
WAIM'10 Proceedings of the 2010 international conference on Web-age information management
A behavior-based SMS antispam system
IBM Journal of Research and Development
Application of evolutionary algorithms in detecting SMS spam at access layer
Proceedings of the 13th annual conference on Genetic and evolutionary computation
ICDCN'10 Proceedings of the 11th international conference on Distributed computing and networking
Contributions to the study of SMS spam filtering: new collection and results
Proceedings of the 11th ACM symposium on Document engineering
Text mining and probabilistic language modeling for online review spam detection
ACM Transactions on Management Information Systems (TMIS)
Content-based mobile spam classification using stylistically motivated features
Pattern Recognition Letters
Review: SMS spam filtering: Methods and data
Expert Systems with Applications: An International Journal
SMSAssassin: crowdsourcing driven mobile-based system for SMS spam filtering
Proceedings of the 12th Workshop on Mobile Computing Systems and Applications
Comment spam classification in blogs through comment analysis and comment-blog post relationships
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
$100,000 prize jackpot. call now!: identifying the pertinent features of SMS spam
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Extended information inference model for unsupervised categorization of web short texts
Journal of Information Science
A Self-Supervised Approach to Comment Spam Detection Based on Content Analysis
International Journal of Information Security and Privacy
On sparsity and drift for effective real-time filtering in microblogs
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
We consider the problem of content-based spam filtering for short text messages that arise in three contexts: mobile (SMS) communication, blog comments, and email summary information such as might be displayed by a low-bandwidth client. Short messages often consist of only a few words, and therefore present a challenge to traditional bag-of-words based spam filters. Using three corpora of short messages and message fields derived from real SMS, blog, and spam messages, we evaluate feature-based and compression-model-based spam filters. We observe that bag-of-words filters can be improved substantially using different features, while compression-model filters perform quite well as-is. We conclude that content filtering for short messages is surprisingly effective.