An evaluation of phrasal and clustered representations on a text categorization task
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Improving Generalization with Active Learning
Machine Learning - Special issue on structured connectionist systems
Machine Learning
Communications of the ACM
Making large-scale support vector machine learning practical
Advances in kernel methods
Text databases & document management
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Road Ahead
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Toward Optimal Active Learning through Sampling Estimation of Error Reduction
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Less is More: Active Learning with Support Vector Machines
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Incremental Learning with Support Vector Machines
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Pricing via Processing or Combatting Junk Mail
CRYPTO '92 Proceedings of the 12th Annual International Cryptology Conference on Advances in Cryptology
Text classification using string kernels
The Journal of Machine Learning Research
Supervised term weighting for automated text categorization
Proceedings of the 2003 ACM symposium on Applied computing
Fast String Kernels using Inexact Matching for Protein Sequences
The Journal of Machine Learning Research
Genres of Spam: Expectations and Deceptions
HICSS '06 Proceedings of the 39th Annual Hawaii International Conference on System Sciences - Volume 03
Extracting key-substring-group features for text classification
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Multi Layer Approach to Defend DDoS Attacks Caused by Spam
MUE '07 Proceedings of the 2007 International Conference on Multimedia and Ubiquitous Engineering
Relaxed online SVMs for spam filtering
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Large Margin Semi-supervised Learning
The Journal of Machine Learning Research
Transductive Support Vector Machine for Personal Inboxes Spam Categorization
CISW '07 Proceedings of the 2007 International Conference on Computational Intelligence and Security Workshops
A survey of learning-based techniques of email spam filtering
Artificial Intelligence Review
IEEE Transactions on Signal Processing
Support vector machines for spam categorization
IEEE Transactions on Neural Networks
Segmental parameterisation and statistical modelling of e-mail headers for spam detection
Information Sciences: an International Journal
On online high-dimensional spherical data clustering and feature selection
Engineering Applications of Artificial Intelligence
Hi-index | 0.00 |
Electronic mail is a major revolution taking place over traditional communication systems due to its convenient, economical, fast, and easy to use nature. A major bottleneck in electronic communications is the enormous dissemination of unwanted, harmful emails known as spam emails. A major concern is the developing of suitable filters that can adequately capture those emails and achieve high performance rate. Machine learning (ML) researchers have developed many approaches in order to tackle this problem. Within the context of machine learning, support vector machines (SVM) have made a large contribution to the development of spam email filtering. Based on SVM, different schemes have been proposed through text classification approaches (TC). A crucial problem when using SVM is the choice of kernels as they directly affect the separation of emails in the feature space. This paper presents thorough investigation of several distance-based kernels and specify spam filtering behaviors using SVM. The majority of used kernels in recent studies concern continuous data and neglect the structure of the text. In contrast to classical kernels, we propose the use of various string kernels for spam filtering. We show how effectively string kernels suit spam filtering problem. On the other hand, data preprocessing is a vital part of text classification where the objective is to generate feature vectors usable by SVM kernels. We detail a feature mapping variants in TC that yield improved performance for the standard SVM in filtering task. Furthermore, to cope for realtime scenarios we propose an online active framework for spam filtering. We present empirical results from an extensive study of online, transductive, and online active methods for classifying spam emails in real time. We show that active online method using string kernels achieves higher precision and recall rates.