Classification cost: An empirical comparison among traditional classifier, Cost-Sensitive Classifier, and MetaCost

Authors:
Jungeun Kim;Keunho Choi;Gunwoo Kim;Yongmoo Suh
Affiliations:
437-070, Hyundai Autoever Corp. 576, Sam-dong, Uiwang-si, Gyeonggi-Do, Republic of Korea;136-701, Business School, Korea University, Anam-dong 5-Ga, Sungbuk-Gu, Seoul, Republic of Korea;305-719, 612, Department of Business Administration, College of Business and Economics, Hanbat National University, San 16-1, Dukmyung-dong, Yuseong-Gu, Daejeon, Republic of Korea;136-701, Business School, Korea University, Anam-dong 5-Ga, Sungbuk-Gu, Seoul, Republic of Korea
Venue:
Expert Systems with Applications: An International Journal
Year:
2012

Citing 15
Cited 0

MetaCost: a general method for making classifiers cost-sensitive

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Neural Data Mining for Credit Card Fraud Detection

ICTAI '99 Proceedings of the 11th IEEE International Conference on Tools with Artificial Intelligence
Cost-Sensitive Learning by Cost-Proportionate Example Weighting

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Minority report in fraud detection: classification of skewed data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Improving classifier utility by altering the misclassification cost ratio

UBDM '05 Proceedings of the 1st international workshop on Utility-based data mining
One-Benefit learning: cost-sensitive learning with restricted cost information

UBDM '05 Proceedings of the 1st international workshop on Utility-based data mining
Does cost-sensitive learning beat sampling for classifying rare classes?

UBDM '05 Proceedings of the 1st international workshop on Utility-based data mining
Maximum profit mining and its application in software development

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Test Strategies for Cost-Sensitive Decision Trees

IEEE Transactions on Knowledge and Data Engineering
ROC graphs with instance-varying costs

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Supporting Fraud Analysis in Mobile Telecommunications Using Case-Based Reasoning

ECCBR '08 Proceedings of the 9th European conference on Advances in Case-Based Reasoning
The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients

Expert Systems with Applications: An International Journal
Estimating the utility value of individual credit card delinquents

Expert Systems with Applications: An International Journal
Misclassification cost-sensitive fault prediction models

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
Auto claim fraud detection using Bayesian learning neural networks

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	12.06

Visualization

Abstract

Loan fraud is a critical factor in the insolvency of financial institutions, so companies make an effort to reduce the loss from fraud by building a model for proactive fraud prediction. However, there are still two critical problems to be resolved for the fraud detection: (1) the lack of cost sensitivity between type I error and type II error in most prediction models, and (2) highly skewed distribution of class in the dataset used for fraud detection because of sparse fraud-related data. The objective of this paper is to examine whether classification cost is affected both by the cost-sensitive approach and by skewed distribution of class. To that end, we compare the classification cost incurred by a traditional cost-insensitive classification approach and two cost-sensitive classification approaches, Cost-Sensitive Classifier (CSC) and MetaCost. Experiments were conducted with a credit loan dataset from a major financial institution in Korea, while varying the distribution of class in the dataset and the number of input variables. The experiments showed that the lowest classification cost was incurred when the MetaCost approach was used and when non-fraud data and fraud data were balanced. In addition, the dataset that includes all delinquency variables was shown to be most effective on reducing the classification cost.