Original Contribution: Stacked generalization
Neural Networks
Automatic combination of multiple ranked retrieval systems
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Combining the evidence of multiple query representations for information retrieval
TREC-2 Proceedings of the second conference on Text retrieval conference
Method combination for document filtering
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Training algorithms for linear text classifiers
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
IEEE Transactions on Pattern Analysis and Machine Intelligence
Making large-scale support vector machine learning practical
Advances in kernel methods
Data Structures for Range Searching
ACM Computing Surveys (CSUR)
A meta-learning approach for text categorization
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Condorcet fusion for improved retrieval
Proceedings of the eleventh international conference on Information and knowledge management
Multiclassifier Systems: Back to the Future
MCS '02 Proceedings of the Third International Workshop on Multiple Classifier Systems
Using bayesian priors to combine classifiers for adaptive filtering
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A multi-system analysis of document and term selection for blind feedback
Proceedings of the thirteenth ACM international conference on Information and knowledge management
The Combination of Text Classifiers Using Reliability Indicators
Information Retrieval
Statistical precision of information retrieval evaluation
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Relaxed online SVMs for spam filtering
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Time-efficient spam e-mail filtering using n-gram models
Pattern Recognition Letters
Spam filtering for short messages
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Asymmetric support vector machines: low false-positive learning under the user tolerance
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Latent dirichlet allocation in web spam filtering
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Email Spam Filtering: A Systematic Review
Foundations and Trends in Information Retrieval
Linked latent Dirichlet allocation in web spam filtering
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
E-Mail Classification for Phishing Defense
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Spam filter evaluation with imprecise ground truth
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Study on Ensemble Classification Methods towards Spam Filtering
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Efficient and effective spam filtering and re-ranking for large web datasets
Information Retrieval
A survey of emerging approaches to spam filtering
ACM Computing Surveys (CSUR)
Facing the spammers: A very effective approach to avoid junk e-mails
Expert Systems with Applications: An International Journal
Multiple classifier systems under attack
MCS'10 Proceedings of the 9th international conference on Multiple Classifier Systems
Computer Networks: The International Journal of Computer and Telecommunications Networking
Hi-index | 0.00 |
We show that a set of independently developed spam filters may be combined in simple ways to provide substantially better filtering than any of the individual filters. The results of fifty-three spam filters evaluated at the TREC 2005 Spam Track were combined post-hoc so as to simulate the parallel on-line operation of the filters. The combined results were evaluated using the TREC methodology, yielding more than a factor of two improvement over the best filter. The simplest method -- averaging the binary classifications returned by the individual filters -- yields a remarkably good result. A new method -- averaging log-odds estimates based on the scores returned by the individual filters -- yields a somewhat better result, and provides input to SVM- and logistic-regression-based stacking methods. The stacking methods appear to provide further improvement, but only for very large corpora. Of the stacking methods, logistic regression yields the better result. Finally, we show that it is possible to select a priori small subsets of the filters that, when combined, still outperform the best individual filter by a substantial margin.