The Use of Background Knowledge in Decision Tree Induction
Machine Learning
Elements of information theory
Elements of information theory
C4.5: programs for machine learning
C4.5: programs for machine learning
Machine Learning
Tracking down software bugs using automatic anomaly detection
Proceedings of the 24th International Conference on Software Engineering
Pruning Decision Trees with Misclassification Costs
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Finding Latent Code Errors via Machine Learning over Program Executions
Proceedings of the 26th International Conference on Software Engineering
Scalable statistical bug isolation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Capturing, indexing, clustering, and retrieving system history
Proceedings of the twentieth ACM symposium on Operating systems principles
Journal of Artificial Intelligence Research
Generating better decision trees
IJCAI'89 Proceedings of the 11th international joint conference on Artificial intelligence - Volume 1
The foundations of cost-sensitive learning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Improved error reporting for software that uses black-box components
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Privacy-preserving remote diagnostics
Proceedings of the 14th ACM conference on Computer and communications security
Test-Cost Sensitive Classification Based on Conditioned Loss Functions
ECML '07 Proceedings of the 18th European conference on Machine Learning
Anytime induction of low-cost, low-error classifiers: a sampling-based approach
Journal of Artificial Intelligence Research
Decision tree classifiers sensitive to heterogeneous costs
Journal of Systems and Software
A survey of cost-sensitive decision tree induction algorithms
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
In some learning settings, the cost of acquiring features for classification must be paid up front, before the classifier is evaluated. In this paper, we introduce the forensic classification problem and present a new algorithm for building decision trees that maximizes classification accuracy while minimizing total feature costs. By expressing the ID3 decision tree algorithm in an information theoretic context, we derive our algorithm from a well-formulated problem objective. We evaluate our algorithm across several datasets and show that, for a given level of accuracy, our algorithm builds cheaper trees than existing methods. Finally, we apply our algorithm to a real-world system, Clarify. Clarify classifies unknown or unexpected program errors by collecting statistics during program runtime which are then used for decision tree classification after an error has occurred. We demonstrate that if the classifier used by the Clarify system is trained with our algorithm, the computational overhead (equivalently, total feature costs) can decrease by many orders of magnitude with only a slight (