Calibrated lazy associative classification

Authors:
Adriano Veloso;Wagner Meira, Jr;Mohammed Zaki
Affiliations:
Universidade Federal de Minas Gerais -- Brazil;Universidade Federal de Minas Gerais -- Brazil;Renseelaer Polytechnic Institute
Venue:
SBBD '08 Proceedings of the 23rd Brazilian symposium on Databases
Year:
2008

Citing 11
Cited 2

C4.5: programs for machine learning

C4.5: programs for machine learning
Bayes and Pseudo-Bayes Estimates of Conditional Probabilities and Their Reliability

ECML '93 Proceedings of the European Conference on Machine Learning
Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Transforming classifier scores into accurate multiclass probability estimates

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
FARMER: finding interesting rule groups in microarray datasets

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Properties and benefits of calibrated classifiers

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Multi-evidence, multi-criteria, lazy associative document classification

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Lazy Associative Classification

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Learning to rank at query-time using association rules

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Calibrated lazy associative classification

Information Sciences: an International Journal
Spam detection using web page content: a new battleground

Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Classification is an important problem in data mining. Given an example x and a class c, a classifier usually works by estimating the probability of x being member of c (i.e., membership probability). Well calibrated classifiers are those able to provide accurate estimates of class membership probabilities, that is, the estimated probability p(c|x) is close to p(c|p(c|x)), which is the true, empirical probability of x being member of c given that the probability estimated by the classifier is p(c|x). Calibration is not a necessary property for producing accurate classifiers, and thus, most of the research has focused on direct accuracy maximization strategies (i.e., maximum margin) rather than on calibration. However, non-calibrated classifiers are problematic in applications where the reliability associated with a prediction must be taken into account (i.e., cost-sensitive classification, cautious classification etc.). In these applications, a sensible use of the classifier must be based on the reliability of its predictions, and thus, the classifier must be well calibrated. In this paper we show that lazy associative classifiers (LAC) are accurate, and well calibrated using a well known, sound, entropy-minimization method. We explore important applications where such characteristics (i.e., accuracy and calibration) are relevant, and we demonstrate empirically that LAC drastically outperforms other classifiers, such as SVMs, Naive Bayes, and Decision Trees (even after these classifiers are calibrated by specific methods). Additional highlights of LAC include the ability to incorporate reliable predictions for improving training, and the ability to refrain from doubtful predictions.