Data Mining Static Code Attributes to Learn Defect Predictors

Authors:
Tim Menzies;Jeremy Greenwald;Art Frank
Affiliations:
IEEE;-;-
Venue:
IEEE Transactions on Software Engineering
Year:
2007

Citing 35
Cited 144

Advances in software inspections

IEEE Transactions on Software Engineering
Ordering effects in clustering

ML92 Proceedings of the ninth international workshop on Machine learning
A practical approach to feature selection

ML92 Proceedings of the ninth international workshop on Machine learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Very Simple Classification Rules Perform Well on Most Commonly Used Datasets

Machine Learning
Estimating attributes: analysis and extensions of RELIEF

ECML-94 Proceedings of the European conference on machine learning on Machine Learning
Machine Learning Approaches to Estimating Software Development Effort

IEEE Transactions on Software Engineering
Software metrics (2nd ed.): a rigorous and practical approach

Software metrics (2nd ed.): a rigorous and practical approach
Design and code inspections to reduce errors in program development

IBM Systems Journal
Software evolution: code delta and code churn

Journal of Systems and Software - Special issue on software maintenance
Software Verification and Validation for Practitioners and Managers, Second Edition

Software Verification and Validation for Practitioners and Managers, Second Edition
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Elements of Software Science (Operating and programming systems series)

Elements of Software Science (Operating and programming systems series)
Lessons learned from 25 years of process improvement: the rise and fall of the NASA software engineering laboratory

Proceedings of the 24th International Conference on Software Engineering
How Perspective-Based Reading Can Improve Requirements Inspections

Computer
Empirically Guided Software Development Using Metric-Based Classification Trees

IEEE Software
Complexity Measure Evaluation and Selection

IEEE Transactions on Software Engineering
Quantitative Analysis of Faults and Failures in a Complex Software System

IEEE Transactions on Software Engineering
Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Guest Editor's Introduction: 21st Century AI-Proud, Not Smug

IEEE Intelligent Systems
Model-Based Tests of Truisms

Proceedings of the 17th IEEE international conference on Automated software engineering
Fault Prediction Modeling for Software Quality Estimation: Comparing Commonly Used Techniques

Empirical Software Engineering
What We Have Learned About Fighting Defects

METRICS '02 Proceedings of the 8th International Symposium on Software Metrics
Metrics That Matter

SEW '02 Proceedings of the 27th Annual NASA Goddard Software Engineering Workshop (SEW-27'02)
An Application of Zero-Inflated Poisson Regression for Software Fault Prediction

ISSRE '01 Proceedings of the 12th International Symposium on Software Reliability Engineering
Developing Fault Predictors for Evolving Software Systems

METRICS '03 Proceedings of the 9th International Symposium on Software Metrics
When Can We Test Less?

METRICS '03 Proceedings of the 9th International Symposium on Software Metrics
Learning Early Lifecycle IV&V Quality Indicators

METRICS '03 Proceedings of the 9th International Symposium on Software Metrics
Benchmarking Attribute Selection Techniques for Discrete Class Data Mining

IEEE Transactions on Knowledge and Data Engineering
Noise Identification with the k-Means Algorithm

ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
Not So Naive Bayes: Aggregating One-Dependence Estimators

Machine Learning
Static analysis tools as early indicators of pre-release defect density

Proceedings of the 27th international conference on Software engineering
Data Mining

Data Mining
A Complexity Measure

IEEE Transactions on Software Engineering
How good is your blind spot sampling policy

HASE'04 Proceedings of the Eighth IEEE international conference on High assurance systems engineering

Modeling the Effect of Size on Defect Proneness for Open-Source Software

PROMISE '07 Proceedings of the Third International Workshop on Predictor Models in Software Engineering
An empirical study of rules for well-formed identifiers: Research Articles

Journal of Software Maintenance and Evolution: Research and Practice - Source Code Analysis and Manipulation (SCAM 2006)
Training on errors experiment to detect fault-prone software modules by spam filter

Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Comments on "Data Mining Static Code Attributes to Learn Defect Predictors"

IEEE Transactions on Software Engineering
Problems with Precision: A Response to "Comments on 'Data Mining Static Code Attributes to Learn Defect Predictors'"

IEEE Transactions on Software Engineering
A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction

Proceedings of the 30th international conference on Software engineering
An extension of fault-prone filtering using precise training and a dynamic threshold

Proceedings of the 2008 international working conference on Mining software repositories
An initial study of the growth of eclipse defects

Proceedings of the 2008 international working conference on Mining software repositories
Comparing design and code metrics for software quality prediction

Proceedings of the 4th international workshop on Predictor models in software engineering
Adapting a fault prediction model to allow inter languagereuse

Proceedings of the 4th international workshop on Predictor models in software engineering
Implications of ceiling effects in defect predictors

Proceedings of the 4th international workshop on Predictor models in software engineering
Can data transformation help in the detection of fault-prone modules?

DEFECTS '08 Proceedings of the 2008 workshop on Defects in large software systems
Analysis of the reliability of a subset of change metrics for defect prediction

Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement
Ensemble of software defect predictors: a case study

Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement
On the effectiveness of early life cycle defect prediction with Bayesian Nets

Empirical Software Engineering
Techniques for evaluating fault prediction models

Empirical Software Engineering
Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models

Empirical Software Engineering
Predicting Software Fault Proneness Model Using Neural Network

PROFES '08 Proceedings of the 9th international conference on Product-Focused Software Process Improvement
A Fault Prediction Model with Limited Fault Data to Improve Test Process

PROFES '08 Proceedings of the 9th international conference on Product-Focused Software Process Improvement
Predicting Defects in Software Using Grammar-Guided Genetic Programming

SETN '08 Proceedings of the 5th Hellenic conference on Artificial Intelligence: Theories, Models and Applications
Can developer-module networks predict failures?

Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering
Accuracy and efficiency comparisons of single- and multi-cycled software classification models

Information and Software Technology
Analysis of Naive Bayes' assumptions on software fault data: An empirical study

Data & Knowledge Engineering
A defect prediction method for software versioning

Software Quality Control
Software quality analysis by combining multiple projects and learners

Software Quality Control
Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem

Information Sciences: an International Journal
Review: A systematic review of software fault prediction studies

Expert Systems with Applications: An International Journal
Prediction of Fault-Prone Software Modules Using a Generic Text Discriminator

IEICE - Transactions on Information and Systems
Data mining source code for locating software bugs: A case study in telecommunication industry

Expert Systems with Applications: An International Journal
An expert system for determining candidate software classes for refactoring

Expert Systems with Applications: An International Journal
Validation of network measures as indicators of defective modules in software systems

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
Revisiting the evaluation of defect prediction models

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
Practical considerations in deploying AI for defect prediction: a case study within the Turkish telecommunication industry

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
On the value of combining feature subset selection with genetic algorithms: faster learning of coverage models

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
How to build repeatable experiments

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
Fault detection and prediction in an open-source software project

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
Misclassification cost-sensitive fault prediction models

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
Predicting Upgrade Project Defects Based on Enhancement Requirements: An Empirical Study

ICSP '09 Proceedings of the International Conference on Software Process: Trustworthy Software Development Processes
Merits of using repository metrics in defect prediction for open source projects

FLOSS '09 Proceedings of the 2009 ICSE Workshop on Emerging Trends in Free/Libre/Open Source Software Research and Development
Cross-project defect prediction: a large scale experiment on data vs. domain vs. process

Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
On the relative value of cross-company and within-company data for defect prediction

Empirical Software Engineering
Increasing diversity: Natural language measures for software fault prediction

Journal of Systems and Software
Evolutionary sampling and software quality modeling of high-assurance systems

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Reducing false alarms in software defect prediction by decision threshold optimization

ESEM '09 Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement
Empirical Evaluation of Hunk Metrics as Bug Predictors

IWSM '09 /Mensura '09 Proceedings of the International Conferences on Software Process and Product Measurement
A new perspective on data homogeneity in software cost estimation: a study in the embedded systems domain

Software Quality Control
Improving software-quality predictions with data sampling and boosting

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Evolutionary data analysis for the class imbalance problem

Intelligent Data Analysis
Fault-prone module detection using large-scale text features based on spam filtering

Empirical Software Engineering
Cost-sensitive boosting neural networks for software defect prediction

Expert Systems with Applications: An International Journal
A symbolic fault-prediction model based on multiobjective particle swarm optimization

Journal of Systems and Software
Can complexity, coupling, and cohesion metrics be used as early indicators of vulnerabilities?

Proceedings of the 2010 ACM Symposium on Applied Computing
Design-level metrics estimation based on code metrics

Proceedings of the 2010 ACM Symposium on Applied Computing
Variance analysis in software fault prediction models

ISSRE'09 Proceedings of the 20th IEEE international conference on software reliability engineering
Recurring bug fixes in object-oriented programs

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Transparent combination of expert and measurement data for defect prediction: an industrial case study

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2
Automatically finding the control variables for complex system behavior

Automated Software Engineering
Defect prediction from static code features: current results, limitations, new approaches

Automated Software Engineering
Predicting vulnerable software components with dependency graphs

Proceedings of the 6th International Workshop on Security Measurements and Metrics
Which is the right source for vulnerability studies?: an empirical analysis on Mozilla Firefox

Proceedings of the 6th International Workshop on Security Measurements and Metrics
Detection of recurring software vulnerabilities

Proceedings of the IEEE/ACM international conference on Automated software engineering
Using traits of web macro scripts to predict reuse

Journal of Visual Languages and Computing
Practical considerations in deploying statistical methods for defect prediction: A case study within the Turkish telecommunications industry

Information and Software Technology
Replication of defect prediction studies: problems, pitfalls and recommendations

Proceedings of the 6th International Conference on Predictive Models in Software Engineering
Usage of multiple prediction models based on defect categories

Proceedings of the 6th International Conference on Predictive Models in Software Engineering
Better, faster, and cheaper: what is better software?

Proceedings of the 6th International Conference on Predictive Models in Software Engineering
On the value of learning from defect dense components for software defect prediction

Proceedings of the 6th International Conference on Predictive Models in Software Engineering
Programmer-based fault prediction

Proceedings of the 6th International Conference on Predictive Models in Software Engineering
An integrated approach to detect fault-prone modules using complexity and text feature metrics

AST/UCMA/ISA/ACN'10 Proceedings of the 2010 international conference on Advances in computer science and information technology
Practical development of an Eclipse-based software fault prediction tool using Naive Bayes algorithm

Expert Systems with Applications: An International Journal
Software metrics reduction for fault-proneness prediction of software modules

NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Software is data too

Proceedings of the FSE/SDP workshop on Future of software engineering research
Review: Software fault prediction: A literature review and current trends

Expert Systems with Applications: An International Journal
Thresholds based outlier detection approach for mining class outliers: An empirical case study on software measurement datasets

Expert Systems with Applications: An International Journal
After-life vulnerabilities: a study on firefox evolution, its vulnerabilities, and fixes

ESSoS'11 Proceedings of the Third international conference on Engineering secure software and systems
Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities

Journal of Systems Architecture: the EUROMICRO Journal
Different strokes for different folks: a case study on software metrics for different defect categories

Proceedings of the 2nd International Workshop on Emerging Trends in Software Metrics
Comparing fine-grained source code changes and code churn for bug prediction

Proceedings of the 8th Working Conference on Mining Software Repositories
Dealing with noise in defect prediction

Proceedings of the 33rd International Conference on Software Engineering
Topic-based defect prediction (NIER track)

Proceedings of the 33rd International Conference on Software Engineering
Defect prediction using social network analysis on issue repositories

Proceedings of the 2011 International Conference on Software and Systems Process
An initial study on the use of execution complexity metrics as indicators of software vulnerabilities

Proceedings of the 7th International Workshop on Software Engineering for Secure Systems
Software defect detection with rocus

Journal of Computer Science and Technology
Evaluating the change of software fault behavior with dataset attributes based on categorical correlation

Advances in Engineering Software
An industrial case study of classifier ensembles for locating software defects

Software Quality Control
Localizing program logical errors using extraction of knowledge from invariants

SEA'11 Proceedings of the 10th international conference on Experimental algorithms
Does measuring code change improve fault prediction?

Proceedings of the 7th International Conference on Predictive Models in Software Engineering
Failure is a four-letter word: a parody in empirical research

Proceedings of the 7th International Conference on Predictive Models in Software Engineering
Improving k nearest neighbor with exemplar generalization for imbalanced classification

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Using the gini coefficient for bug prediction in eclipse

Proceedings of the 12th International Workshop on Principles of Software Evolution and the 7th annual ERCIM Workshop on Software Evolution
An explanatory analysis on eclipse beta-release bugs through in-process metrics

Proceedings of the 8th international workshop on Software quality
Micro interaction metrics for defect prediction

Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
A framework for defect prediction in specific software project contexts

CEE-SET'08 Proceedings of the Third IFIP TC 2 Central and East European conference on Software engineering techniques
The inductive software engineering manifesto: principles for industrial data mining

Proceedings of the International Workshop on Machine Learning Technologies in Software Engineering
Transfer learning for cross-company software defect prediction

Information and Software Technology
Sample-based software defect prediction with active and semi-supervised learning

Automated Software Engineering
Guest editorial: learning to organize testing

Automated Software Engineering
An investigation on the feasibility of cross-project defect prediction

Automated Software Engineering
Searching for rules to detect defective modules: A subgroup discovery approach

Information Sciences: an International Journal
Regularities in learning defect predictors

PROFES'10 Proceedings of the 11th international conference on Product-Focused Software Process Improvement
A topic-based approach for narrowing the search space of buggy files from a bug report

ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
On the use of calling structure information to improve fault prediction

Empirical Software Engineering
Evaluating defect prediction approaches: a benchmark and an extensive comparison

Empirical Software Engineering
Privacy and utility for defect prediction: experiments with MORPH

Proceedings of the 34th International Conference on Software Engineering
Mining input sanitization patterns for predicting SQL injection and cross site scripting vulnerabilities

Proceedings of the 34th International Conference on Software Engineering
Can I clone this piece of code here?

Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering
Predicting common web application vulnerabilities from input validation and sanitization code patterns

Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering
Defect, defect, defect: defect prediction 2.0

Proceedings of the 8th International Conference on Predictive Models in Software Engineering
An adaptive approach with active learning in software fault prediction

Proceedings of the 8th International Conference on Predictive Models in Software Engineering
Comparing the performance of fault prediction models which report multiple performance measures: recomputing the confusion matrix

Proceedings of the 8th International Conference on Predictive Models in Software Engineering
Method-level bug prediction

Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement
Predicting defect numbers based on defect state transition models

Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement
Empirical study of Software Quality estimation

Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology
Recalling the "imprecision" of cross-project defect prediction

Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
Failure prediction based on log files using Random Indexing and Support Vector Machines

Journal of Systems and Software
The design of polynomial function-based neural network predictors for detection of software defects

Information Sciences: an International Journal
Predicting aging-related bugs using software complexity metrics

Performance Evaluation
Influence of confirmation biases of developers on software quality: an empirical study

Software Quality Control
Empirical evaluation of the effects of mixed project data on learning defect predictors

Information and Software Technology
A learning-based method for combining testing techniques

Proceedings of the 2013 International Conference on Software Engineering
Transfer defect learning

Proceedings of the 2013 International Conference on Software Engineering
How, and why, process metrics are better

Proceedings of the 2013 International Conference on Software Engineering
Measuring architecture quality by structure plus history analysis

Proceedings of the 2013 International Conference on Software Engineering
Predicting bug-fixing time: an empirical study of commercial software projects

Proceedings of the 2013 International Conference on Software Engineering
Data science for software engineering

Proceedings of the 2013 International Conference on Software Engineering
Better cross company defect prediction

Proceedings of the 10th Working Conference on Mining Software Repositories
Sample size vs. bias in defect prediction

Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
A cost-effectiveness criterion for applying software defect prediction models

Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
Training data selection for cross-project defect prediction

Proceedings of the 9th International Conference on Predictive Models in Software Engineering
An algorithmic approach to missing data problem in modeling human aspects in software development

Proceedings of the 9th International Conference on Predictive Models in Software Engineering
A comparative evaluation of static analysis actionable alert identification techniques

Proceedings of the 9th International Conference on Predictive Models in Software Engineering
Beyond data mining; towards "idea engineering"

Proceedings of the 9th International Conference on Predictive Models in Software Engineering
Predicting SQL injection and cross site scripting vulnerabilities through mining input sanitization patterns

Information and Software Technology
A study of subgroup discovery approaches for defect prediction

Information and Software Technology
Is this a bug or an obsolete test?

ECOOP'13 Proceedings of the 27th European conference on Object-Oriented Programming
Comparative study on effectiveness of standard bug prediction approaches

Proceedings of the 5th IBM Collaborative Academia Research Exchange Workshop
Empirical studies on feature selection for software fault prediction

Proceedings of the 5th Asia-Pacific Symposium on Internetware
A study of cyclic dependencies on defect profile of software components

Journal of Systems and Software
An in-depth study of the potentially confounding effect of class size in fault prediction

ACM Transactions on Software Engineering and Methodology (TOSEM)
Source code size estimation approaches for object-oriented systems from UML class diagrams: A comparative study

Information and Software Technology
Software quality assessment using a multi-strategy classifier

Information Sciences: an International Journal
Software defect prediction using Bayesian networks

Empirical Software Engineering
Software defect prediction using relational association rule mining

Information Sciences: an International Journal
DConfusion: a technique to allow cross study performance evaluation of fault prediction studies

Automated Software Engineering

Quantified Score

Hi-index	0.01

Visualization

Abstract

The value of using static code attributes to learn defect predictors has been widely debated. Prior work has explored issues like the merits of "McCabes versus Halstead versus lines of code counts” for generating defect predictors. We show here that such debates are irrelevant since how the attributes are used to build predictors is much more important than which particular attributes are used. Also, contrary to prior pessimism, we show that such defect predictors are demonstrably useful and, on the data studied here, yield predictors with a mean probability of detection of 71 percent and mean false alarms rates of 25 percent. These predictors would be useful for prioritizing a resource-bound exploration of code that has yet to be inspected.