Generating estimates of classification confidence for a case-based spam filter

Authors:
Sarah Jane Delany;Pádraig Cunningham;Dónal Doyle;Anton Zamolotskikh
Affiliations:
Dublin Institute of Technology, Dublin 8, Ireland;Trinity College, University of Dublin, Dublin 2, Ireland;Trinity College, University of Dublin, Dublin 2, Ireland;Trinity College, University of Dublin, Dublin 2, Ireland
Venue:
ICCBR'05 Proceedings of the 6th international conference on Case-Based Reasoning Research and Development
Year:
2005

Citing 10
Cited 19

Fundamentals of neural networks: architectures, algorithms, and applications

Fundamentals of neural networks: architectures, algorithms, and applications
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Machine Learning

Machine Learning
Case-Based Reasoning with Confidence

EWCBR '00 Proceedings of the 5th European Workshop on Advances in Case-Based Reasoning
Helping a CBR Program Know What It Knows

ICCBR '01 Proceedings of the 4th International Conference on Case-Based Reasoning: Case-Based Reasoning Research and Development
An evaluation of statistical spam filtering techniques

ACM Transactions on Asian Language Information Processing (TALIP)
A comparison of event models for Naive Bayes anti-spam e-mail filtering

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
A Case-Based Explanation System for Black-Box Systems

Artificial Intelligence Review
An Assessment of Case-Based Reasoning for Spam Filtering

Artificial Intelligence Review
Support vector machines for spam categorization

IEEE Transactions on Neural Networks

AI and Similarity

IEEE Intelligent Systems
Credible Case-Based Inference Using Similarity Profiles

IEEE Transactions on Knowledge and Data Engineering
Assessing Classification Accuracy in the Revision Stage of a CBR Spam Filtering System

ICCBR '07 Proceedings of the 7th international conference on Case-Based Reasoning: Case-Based Reasoning Research and Development
Understanding Dubious Future Problems

ECCBR '08 Proceedings of the 9th European conference on Advances in Case-Based Reasoning
Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks

Expert Systems with Applications: An International Journal
Gaining insight through case-based explanation

Journal of Intelligent Information Systems
Assessing Confidence in Cased Based Reuse Step

Proceedings of the 2007 conference on Artificial Intelligence Research and Development
ECUE: A Spam Filter that Uses Machine Learning to Track Concept Drift

Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Harnessing the strengths of anytime algorithms for constant data streams

Data Mining and Knowledge Discovery
Adaptive classification with jumping emerging patterns

RSKT'08 Proceedings of the 3rd international conference on Rough sets and knowledge technology
Secure evaluation of private linear branching programs with medical applications

ESORICS'09 Proceedings of the 14th European conference on Research in computer security
Dynamic classifier systems and their applications to random forest ensembles

ICANNGA'09 Proceedings of the 9th international conference on Adaptive and natural computing algorithms
On the relation between jumping emerging patterns and rough set theory with application to data classification

Transactions on rough sets XII
Classification by multiple reducts-kNN with confidence

IDEAL'10 Proceedings of the 11th international conference on Intelligent data engineering and automated learning
Control of variables in reducts - kNN classification with confidence

KES'11 Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part IV
Rough set feature selection algorithms for textual case-based classification

ECCBR'06 Proceedings of the 8th European conference on Advances in Case-Based Reasoning
Unsupervised feature selection for text data

ECCBR'06 Proceedings of the 8th European conference on Advances in Case-Based Reasoning
Feature weighted minimum distance classifier with multi-class confidence estimation

AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence
Transductive relational classification in the co-training paradigm

MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Producing estimates of classification confidence is surprisingly difficult. One might expect that classifiers that can produce numeric classification scores (e.g. k-Nearest Neighbour, Naïve Bayes or Support Vector Machines) could readily produce confidence estimates based on thresholds. In fact, this proves not to be the case, probably because these are not probabilistic classifiers in the strict sense. The numeric scores coming from k-Nearest Neighbour, Naïve Bayes and Support Vector Machine classifiers are not well correlated with classification confidence. In this paper we describe a case-based spam filtering application that would benefit significantly from an ability to attach confidence predictions to positive classifications (i.e. messages classified as spam). We show that ‘obvious' confidence metrics for a case-based classifier are not effective. We propose an ensemble-like solution that aggregates a collection of confidence metrics and show that this offers an effective solution in this spam filtering domain.