Problems with Precision: A Response to "Comments on 'Data Mining Static Code Attributes to Learn Defect Predictors'"

  • Authors:
  • Tim Menzies;Alex Dekhtyar;Justin Distefano;Jeremy Greenwald

  • Affiliations:
  • IEEE;-;-;-

  • Venue:
  • IEEE Transactions on Software Engineering
  • Year:
  • 2007

Quantified Score

Hi-index 0.01

Visualization

Abstract

Zhang & Zhang (hereafter, the Zhangs) argue that such the low precision detectors seen in Menzies, Greenwald, and Frank's paper Data Mining Static Code Attributes to Learn Defect Predictors [13] (hereafter, DMP) are "not satisfactory for practical purposes". They demand that "a good prediction model should achieve both high Recall and high Precision" (which we will denote as "high precision & recall"). All other detectors, they argue, "may lead to impractical prediction models". We have a different view and this short note explains why. While we disagree with the Zhangs' conclusions, we find that their derived equation is an important result. The insightful feature of the Zhangs' equation is that it can use information about the problem at hand to characterize the pre-conditions for high precision and high recall detectors. To the best of our knowledge, no such characterization hasbeen previously reported (at least, not in the software engineering literature).