Consolidated tree classifier learning in a car insurance fraud detection domain with class imbalance

Authors:
Jesús M. Pérez;Javier Muguerza;Olatz Arbelaitz;Ibai Gurrutxaga;José I. Martín
Affiliations:
Dept. of Computer Architecture and Technology, University of the Basque Country, Donostia, Spain;Dept. of Computer Architecture and Technology, University of the Basque Country, Donostia, Spain;Dept. of Computer Architecture and Technology, University of the Basque Country, Donostia, Spain;Dept. of Computer Architecture and Technology, University of the Basque Country, Donostia, Spain;Dept. of Computer Architecture and Technology, University of the Basque Country, Donostia, Spain
Venue:
ICAPR'05 Proceedings of the Third international conference on Advances in Pattern Recognition - Volume Part I
Year:
2005

Citing 3
Cited 2

C4.5: programs for machine learning

C4.5: programs for machine learning
Knowledge Acquisition form Examples Vis Multiple Models

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Learning when training data are costly: the effect of class distribution on tree induction

Journal of Artificial Intelligence Research

An expert system for detecting automobile insurance fraud using social network analysis

Expert Systems with Applications: An International Journal
C4.5 consolidation process: an alternative to intelligent oversampling methods in class imbalance problems

CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an analysis of the behaviour of Consolidated Trees, CT (classification trees induced from multiple subsamples but without loss of explaining capacity). We analyse how CT trees behave when used to solve a fraud detection problem in a car insurance company. This domain has two important characteristics: the explanation given to the classification made is critical to help investigating the received reports or claims, and besides, this is a typical example of class imbalance problem due to its skewed class distribution. In the results presented in the paper CT and C4.5 trees have been compared, from the accuracy and structural stability (explaining capacity) point of view and, for both algorithms, the best class distribution has been searched.. Due to the different associated costs of different error types (costs of investigating suspicious reports, etc.) a wider analysis of the error has also been done: precision/recall, ROC curve, etc.