Cost-Sensitive decision tree learning for forensic classification

Authors:
Jason V. Davis;Jungwoo Ha;Christopher J. Rossbach;Hany E. Ramadan;Emmett Witchel
Affiliations:
Dept. of Computer Sciences, The University of Texas at Austin;Dept. of Computer Sciences, The University of Texas at Austin;Dept. of Computer Sciences, The University of Texas at Austin;Dept. of Computer Sciences, The University of Texas at Austin;Dept. of Computer Sciences, The University of Texas at Austin
Venue:
ECML'06 Proceedings of the 17th European conference on Machine Learning
Year:
2006

Citing 12
Cited 6

The Use of Background Knowledge in Decision Tree Induction

Machine Learning
Elements of information theory

Elements of information theory
C4.5: programs for machine learning

C4.5: programs for machine learning
Machine Learning

Machine Learning
Tracking down software bugs using automatic anomaly detection

Proceedings of the 24th International Conference on Software Engineering
Pruning Decision Trees with Misclassification Costs

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Finding Latent Code Errors via Machine Learning over Program Executions

Proceedings of the 26th International Conference on Software Engineering
Scalable statistical bug isolation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Capturing, indexing, clustering, and retrieving system history

Proceedings of the twentieth ACM symposium on Operating systems principles
Cost-sensitive classification: empirical evaluation of a hybrid genetic decision tree induction algorithm

Journal of Artificial Intelligence Research
Generating better decision trees

IJCAI'89 Proceedings of the 11th international joint conference on Artificial intelligence - Volume 1
The foundations of cost-sensitive learning

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2

Improved error reporting for software that uses black-box components

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Privacy-preserving remote diagnostics

Proceedings of the 14th ACM conference on Computer and communications security
Test-Cost Sensitive Classification Based on Conditioned Loss Functions

ECML '07 Proceedings of the 18th European conference on Machine Learning
Anytime induction of low-cost, low-error classifiers: a sampling-based approach

Journal of Artificial Intelligence Research
Decision tree classifiers sensitive to heterogeneous costs

Journal of Systems and Software
A survey of cost-sensitive decision tree induction algorithms

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In some learning settings, the cost of acquiring features for classification must be paid up front, before the classifier is evaluated. In this paper, we introduce the forensic classification problem and present a new algorithm for building decision trees that maximizes classification accuracy while minimizing total feature costs. By expressing the ID3 decision tree algorithm in an information theoretic context, we derive our algorithm from a well-formulated problem objective. We evaluate our algorithm across several datasets and show that, for a given level of accuracy, our algorithm builds cheaper trees than existing methods. Finally, we apply our algorithm to a real-world system, Clarify. Clarify classifies unknown or unexpected program errors by collecting statistics during program runtime which are then used for decision tree classification after an error has occurred. We demonstrate that if the classifier used by the Clarify system is trained with our algorithm, the computational overhead (equivalently, total feature costs) can decrease by many orders of magnitude with only a slight (