Problems with Precision: A Response to "Comments on 'Data Mining Static Code Attributes to Learn Defect Predictors'"

Authors:
Tim Menzies;Alex Dekhtyar;Justin Distefano;Jeremy Greenwald
Affiliations:
IEEE;-;-;-
Venue:
IEEE Transactions on Software Engineering
Year:
2007

Citing 13
Cited 26

A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Recovering Traceability Links between Code and Documentation

IEEE Transactions on Software Engineering
Recovering documentation-to-source-code traceability links using latent semantic indexing

Proceedings of the 25th International Conference on Software Engineering
Cost-Sensitive Boosting In Software Quality Modeling

HASE '02 Proceedings of the 7th IEEE International Symposium on High Assurance Systems Engineering
Feature Identification: A Novel Approach and a Case Study

ICSM '05 Proceedings of the 21st IEEE International Conference on Software Maintenance
Advancing Candidate Link Generation for Requirements Tracing: The Study of Methods

IEEE Transactions on Software Engineering
The Detection and Classification of Non-Functional Requirements with Application to Early Aspects

RE '06 Proceedings of the 14th IEEE International Requirements Engineering Conference
Data Mining Static Code Attributes to Learn Defect Predictors

IEEE Transactions on Software Engineering
K-Means+ID3: A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods

IEEE Transactions on Knowledge and Data Engineering
Make the Most of Your Time: How Should the Analyst Work with Automated Traceability Tools?

PROMISE '07 Proceedings of the Third International Workshop on Predictor Models in Software Engineering
Comments on "Data Mining Static Code Attributes to Learn Defect Predictors"

IEEE Transactions on Software Engineering
An empirical investigation of tree ensembles in biometrics and bioinformatics research

An empirical investigation of tree ensembles in biometrics and bioinformatics research
How good is your blind spot sampling policy

HASE'04 Proceedings of the Eighth IEEE international conference on High assurance systems engineering

Predicting Defects in Software Using Grammar-Guided Genetic Programming

SETN '08 Proceedings of the 5th Hellenic conference on Artificial Intelligence: Theories, Models and Applications
Accuracy and efficiency comparisons of single- and multi-cycled software classification models

Information and Software Technology
Analysis of Naive Bayes' assumptions on software fault data: An empirical study

Data & Knowledge Engineering
On modeling software defect repair time

Empirical Software Engineering
Revisiting the evaluation of defect prediction models

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
Misclassification cost-sensitive fault prediction models

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
On the relative value of cross-company and within-company data for defect prediction

Empirical Software Engineering
Improving software-quality predictions with data sampling and boosting

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Using traits of web macro scripts to predict reuse

Journal of Visual Languages and Computing
Replication of defect prediction studies: problems, pitfalls and recommendations

Proceedings of the 6th International Conference on Predictive Models in Software Engineering
Review: Software fault prediction: A literature review and current trends

Expert Systems with Applications: An International Journal
Comparing fine-grained source code changes and code churn for bug prediction

Proceedings of the 8th Working Conference on Mining Software Repositories
Evaluating the change of software fault behavior with dataset attributes based on categorical correlation

Advances in Engineering Software
An industrial case study of classifier ensembles for locating software defects

Software Quality Control
High-impact defects: a study of breakage and surprise defects

Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
An investigation on the feasibility of cross-project defect prediction

Automated Software Engineering
Searching for rules to detect defective modules: A subgroup discovery approach

Information Sciences: an International Journal
Comparing the performance of fault prediction models which report multiple performance measures: recomputing the confusion matrix

Proceedings of the 8th International Conference on Predictive Models in Software Engineering
Predicting aging-related bugs using software complexity metrics

Performance Evaluation
Empirical evaluation of the effects of mixed project data on learning defect predictors

Information and Software Technology
A learning-based method for combining testing techniques

Proceedings of the 2013 International Conference on Software Engineering
Better cross company defect prediction

Proceedings of the 10th Working Conference on Mining Software Repositories
Using link semantics to recommend collaborations in academic social networks

Proceedings of the 22nd international conference on World Wide Web companion
A study of subgroup discovery approaches for defect prediction

Information and Software Technology
Comparative study on effectiveness of standard bug prediction approaches

Proceedings of the 5th IBM Collaborative Academia Research Exchange Workshop
DConfusion: a technique to allow cross study performance evaluation of fault prediction studies

Automated Software Engineering

Quantified Score

Hi-index	0.01

Visualization

Abstract

Zhang & Zhang (hereafter, the Zhangs) argue that such the low precision detectors seen in Menzies, Greenwald, and Frank's paper Data Mining Static Code Attributes to Learn Defect Predictors [13] (hereafter, DMP) are "not satisfactory for practical purposes". They demand that "a good prediction model should achieve both high Recall and high Precision" (which we will denote as "high precision & recall"). All other detectors, they argue, "may lead to impractical prediction models". We have a different view and this short note explains why. While we disagree with the Zhangs' conclusions, we find that their derived equation is an important result. The insightful feature of the Zhangs' equation is that it can use information about the problem at hand to characterize the pre-conditions for high precision and high recall detectors. To the best of our knowledge, no such characterization hasbeen previously reported (at least, not in the software engineering literature).