A rule-based message filtering system
ACM Transactions on Information Systems (TOIS)
Combining classifiers in text categorization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
IEEE Transactions on Pattern Analysis and Machine Intelligence
MailCat: an intelligent assistant for organizing e-mail
Proceedings of the third annual conference on Autonomous Agents
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Robust Classification for Imprecise Environments
Machine Learning
Sum Versus Vote Fusion in Multiple Classifier Systems
IEEE Transactions on Pattern Analysis and Machine Intelligence
Incremental Learning in SwiftFile
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Email classification with co-training
CASCON '01 Proceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research
MET: an experimental system for Malicious Email Tracking
Proceedings of the 2002 workshop on New security paradigms
A DEA approach for model combination
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A comparison of event models for Naive Bayes anti-spam e-mail filtering
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Combining text and heuristics for cost-sensitive spam filtering
ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Learning spam: simple techniques for freely-available software
ATEC '03 Proceedings of the annual conference on USENIX Annual Technical Conference
The weighted majority algorithm
SFCS '89 Proceedings of the 30th Annual Symposium on Foundations of Computer Science
Ensembles as a sequence of classifiers
IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2
Combining naive bayes and n-gram language models for text classification
ECIR'03 Proceedings of the 25th European conference on IR research
ISI'03 Proceedings of the 1st NSF/NIJ conference on Intelligence and security informatics
Estimating continuous distributions in Bayesian classifiers
UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Support vector machines for spam categorization
IEEE Transactions on Neural Networks
Learn to Detect Phishing Scams Using Learning and Ensemble ?Methods
WI-IATW '07 Proceedings of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops
Partitioned logistic regression for spam filtering
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Nuisance level of a voice call
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Thwarting E-mail Spam Laundering
ACM Transactions on Information and System Security (TISSEC)
Symbiotic Data Mining for Personalized Spam Filtering
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Combining SVM classifiers for email anti-spam filtering
IWANN'07 Proceedings of the 9th international work conference on Artificial neural networks
Symbiotic filtering for spam email detection
Expert Systems with Applications: An International Journal
Social network analysis of web links to eliminate false positives in collaborative anti-spam systems
Journal of Network and Computer Applications
A survey of emerging approaches to spam filtering
ACM Computing Surveys (CSUR)
Multiple classifier systems under attack
MCS'10 Proceedings of the 9th international conference on Multiple Classifier Systems
Representations for multi-document event clustering
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
Machine learning and data mining can be effectively used to model, classify and discover interesting information for a wide variety of data including email. The Email Mining Toolkit, EMT, has been designed to provide a wide range of analyses for arbitrary email sources. Depending upon the task, one can usually achieve very high accuracy, but with some amount of false positive tradeoff. Generally false positives are prohibitively expensive in the real world. In the case of spam detection, for example, even if one email is misclassified, this may be unacceptable if it is a very important email. Much work has been done to improve specific algorithms for the task of detecting unwanted messages, but less work has been report on leveraging multiple algorithms and correlating models in this particular domain of email analysis.EMT has been updated with new correlation functions allowing the analyst to integrate a number of EMT's user behavior models available in the core technology. We present results of combining classifier outputs for improving both accuracy and reducing false positives for the problem of spam detection. We apply these methods to a very large email data set and show results of different combination methods on these corpora. We introduce a new method to compare multiple and combined classifiers, and show how it differs from past work. The method analyzes the relative gain and maximum possible accuracy that can be achieved for certain combinations of classifiers to automatically choose the best combination.