Relaxed online SVMs for spam filtering

Authors:
D. Sculley;Gabriel M. Wachman
Affiliations:
Tufts University, Medford, MA;Tufts University, Medford, MA
Venue:
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2007

Citing 8
Cited 45

An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Alpha seeding for support vector machines

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Spam: It's Not Just for Inboxes Anymore

Computer
On-line spam filter fusion

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Support vector machines for spam categorization

IEEE Transactions on Neural Networks

Practical learning from one-sided feedback

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Spam filtering for short messages

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Fixed-threshold SMO for Joint Constraint Learning Algorithm of Structural SVM

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Partitioned logistic regression for spam filtering

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Asymmetric support vector machines: low false-positive learning under the user tolerance

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Trusting spam reporters: A reporter-based reputation system for email filtering

ACM Transactions on Information Systems (TOIS)
Email Spam Filtering: A Systematic Review

Foundations and Trends in Information Retrieval
Evaluation of spam detection and prevention frameworks for email and image spam: a state of art

Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services
Review: A review of machine learning approaches to Spam filtering

Expert Systems with Applications: An International Journal
Genre-based decomposition of email class noise

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Spam filter evaluation with imprecise ground truth

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Online phishing classification using adversarial data mining and signaling games

Proceedings of the ACM SIGKDD Workshop on CyberSecurity and Intelligence Informatics
A survey of learning-based techniques of email spam filtering

Artificial Intelligence Review
Study on Ensemble Classification Methods towards Spam Filtering

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Improved Online Support Vector Machines Spam Filtering Using String Kernels

CIARP '09 Proceedings of the 14th Iberoamerican Conference on Pattern Recognition: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
Tracking a moving hypothesis for visual data with explicit switch detection

CISDA'09 Proceedings of the Second IEEE international conference on Computational intelligence for security and defense applications
Biometric classifier update using online learning: A case study in near infrared face verification

Image and Vision Computing
Online phishing classification using adversarial data mining and signaling games

ACM SIGKDD Explorations Newsletter
A study of spam filtering using support vector machines

Artificial Intelligence Review
An investigation of real-valued accuracy-based learning classifier systems for electronic fraud detection

Proceedings of the 12th annual conference companion on Genetic and evolutionary computation
Multi-field learning for email spam filtering

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Detecting comment spam through content analysis

WAIM'10 Proceedings of the 2010 international conference on Web-age information management
Enhanced email spam filtering through combining similarity graphs

Proceedings of the fourth ACM international conference on Web search and data mining
Protein interaction detection in sentences via Gaussian Processes: a preliminary evaluation

International Journal of Data Mining and Bioinformatics
Word co-occurrence features for text classification

Information Systems
Detecting bots via incremental LS-SVM learning with dynamic feature adaptation

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Enhanced Topic-based Vector Space Model for semantics-aware spam filtering

Expert Systems with Applications: An International Journal
PCA document reconstruction for email classification

Computational Statistics & Data Analysis
A survey of emerging approaches to spam filtering

ACM Computing Surveys (CSUR)
Segmental parameterisation and statistical modelling of e-mail headers for spam detection

Information Sciences: an International Journal
Word sense disambiguation for spam filtering

Electronic Commerce Research and Applications
An aggressive margin-based algorithm for incremental learning

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
$100,000 prize jackpot. call now!: identifying the pertinent features of SMS spam

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
B@bel: leveraging email delivery for spam mitigation

Security'12 Proceedings of the 21st USENIX conference on Security symposium
Impact of spam exposure on user engagement

Security'12 Proceedings of the 21st USENIX conference on Security symposium
Robust detection of comment spam using entropy rate

Proceedings of the 5th ACM workshop on Security and artificial intelligence
Topic evolution prediction of user generated contents considering enterprise generated contents

Proceedings of the First ACM International Workshop on Hot Topics on Interdisciplinary Social Networks Research
Web-based closed-domain data extraction on online advertisements

Information Systems
Live and learn from mistakes: A lightweight system for document classification

Information Processing and Management: an International Journal
Spam e-mail classification based on the IFWB algorithm

ACIIDS'13 Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part I
Shame to be sham: addressing content-based grey hat search engine optimization

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Effect of feature selection methods on machine learning classifiers for detecting email spams

Proceedings of the 2013 Research in Adaptive and Convergent Systems
Reversing the effects of tokenisation attacks against content-based spam filters

International Journal of Security and Networks
Predicting community preference of comments on the Social Web

Web Intelligence and Agent Systems
Interaction between feature subset selection techniques and machine learning classifiers for detecting unsolicited emails

ACM SIGAPP Applied Computing Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

Spam is a key problem in electronic communication, including large-scale email systems and the growing number of blogs. Content-based filtering is one reliable method of combating this threat in its various forms, but some academic researchers and industrial practitioners disagree on how best to filter spam. The former have advocated the use of Support Vector Machines (SVMs) for content-based filtering, as this machine learning methodology gives state-of-the-art performance for text classification. However, similar performance gains have yet to be demonstrated for online spam filtering. Additionally, practitioners cite the high cost of SVMs as reason to prefer faster (if less statistically robust) Bayesian methods. In this paper, we offer a resolution to this controversy. First, we show that online SVMs indeed give state-of-the-art classification performance on online spam filtering on large benchmark data sets. Second, we show that nearly equivalent performance may be achieved by a Relaxed Online SVM (ROSVM) at greatly reduced computational cost. Our results are experimentally verified on email spam, blog spam, and splog detection tasks.