C4.5: programs for machine learning
C4.5: programs for machine learning
Machine Learning
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
The Random Subspace Method for Constructing Decision Forests
IEEE Transactions on Pattern Analysis and Machine Intelligence
Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
MailCat: an intelligent assistant for organizing e-mail
Proceedings of the third annual conference on Autonomous Agents
Foundations of statistical natural language processing
Foundations of statistical natural language processing
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Analyzing the effectiveness and applicability of co-training
Proceedings of the ninth international conference on Information and knowledge management
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Machine Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Efficient handling of high-dimensional feature spaces by randomized classifier ensembles
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Email classification with co-training
CASCON '01 Proceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research
A Neural Network Based Approach to Automated E-Mail Classification
WI '03 Proceedings of the 2003 IEEE/WIC International Conference on Web Intelligence
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
Introduction to this special issue on revisiting and reinventing e-mail
Human-Computer Interaction
In search of coherence: a review of e-mail research
Human-Computer Interaction
Information Sciences: an International Journal
Gaussian case-based reasoning for business failure prediction with empirical data in China
Information Sciences: an International Journal
Building a cost-constrained decision tree with multiple condition attributes
Information Sciences: an International Journal
Information Sciences: an International Journal
An innovative analyser for multi-classifier e-mail classification based on grey list analysis
Journal of Network and Computer Applications
Error bounds of multi-graph regularized semi-supervised classification
Information Sciences: an International Journal
Review: A review of machine learning approaches to Spam filtering
Expert Systems with Applications: An International Journal
A discrete mixture-based kernel for SVMs: Application to spam and image categorization
Information Processing and Management: an International Journal
An ensemble approach applied to classify spam e-mails
Expert Systems with Applications: An International Journal
Concentration based feature construction approach for spam detection
IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
A mining-based approach on discovering courses pattern for constructing suitable learning path
Expert Systems with Applications: An International Journal
A scalable intelligent non-content-based spam-filtering framework
Expert Systems with Applications: An International Journal
Mining data with random forests: A survey and results of new tests
Pattern Recognition
Expert Systems with Applications: An International Journal
A survey and experimental evaluation of image spam filtering techniques
Pattern Recognition Letters
A new co-training-style random forest for computer aided diagnosis
Journal of Intelligent Information Systems
An improved K-nearest-neighbor algorithm for text categorization
Expert Systems with Applications: An International Journal
Supervised subspace projections for constructing ensembles of classifiers
Information Sciences: an International Journal
Segmental parameterisation and statistical modelling of e-mail headers for spam detection
Information Sciences: an International Journal
A generalized cluster centroid based classifier for text categorization
Information Processing and Management: an International Journal
Detecting spammers via aggregated historical data set
NSS'12 Proceedings of the 6th international conference on Network and System Security
International Journal of Swarm Intelligence Research
Finite sets of data compatible with multidimensional inequality measures
Information Sciences: an International Journal
Learning to filter spam emails: An ensemble learning approach
International Journal of Hybrid Intelligent Systems
Hi-index | 0.08 |
In this paper we study supervised and semi-supervised classification of e-mails. We consider two tasks: filing e-mails into folders and spam e-mail filtering. Firstly, in a supervised learning setting, we investigate the use of random forest for automatic e-mail filing into folders and spam e-mail filtering. We show that random forest is a good choice for these tasks as it runs fast on large and high dimensional databases, is easy to tune and is highly accurate, outperforming popular algorithms such as decision trees, support vector machines and naive Bayes. We introduce a new accurate feature selector with linear time complexity. Secondly, we examine the applicability of the semi-supervised co-training paradigm for spam e-mail filtering by employing random forests, support vector machines, decision tree and naive Bayes as base classifiers. The study shows that a classifier trained on a small set of labelled examples can be successfully boosted using unlabelled examples to accuracy rate of only 5% lower than a classifier trained on all labelled examples. We investigate the performance of co-training with one natural feature split and show that in the domain of spam e-mail filtering it can be as competitive as co-training with two natural feature splits.