On the value of learning from defect dense components for software defect prediction

Authors:
Hongyu Zhang;Adam Nelson;Tim Menzies
Affiliations:
Tsinghua University, Beijing, China;CS & EE, WVU, Morgantown;CS & EE, WVU, Morgantown
Venue:
Proceedings of the 6th International Conference on Predictive Models in Software Engineering
Year:
2010

Citing 32
Cited 2

Advances in software inspections

IEEE Transactions on Software Engineering
Software reliability: measurement, prediction, application

Software reliability: measurement, prediction, application
Understanding and Controlling Software Costs

IEEE Transactions on Software Engineering
C4.5: programs for machine learning

C4.5: programs for machine learning
Safeware: system safety and computers

Safeware: system safety and computers
Some Conservative Stopping Rules for the Operational Testing of Safety-Critical Software

IEEE Transactions on Software Engineering
Software evolution: code delta and code churn

Journal of Systems and Software - Special issue on software maintenance
Fast formal analysis of requirements via “Topoi Diagrams”

ICSE '01 Proceedings of the 23rd International Conference on Software Engineering
How Perspective-Based Reading Can Improve Requirements Inspections

Computer
Software Testability: The New Verification

IEEE Software
When to Test Less

IEEE Software
Quantitative Analysis of Faults and Failures in a Complex Software System

IEEE Transactions on Software Engineering
Towards a Theory for Integration of Mathematical Verification and Empirical Testing

ASE '98 Proceedings of the 13th IEEE international conference on Automated software engineering
Model-Based Tests of Truisms

Proceedings of the 17th IEEE international conference on Automated software engineering
Fault Prediction Modeling for Software Quality Estimation: Comparing Commonly Used Techniques

Empirical Software Engineering
What We Have Learned About Fighting Defects

METRICS '02 Proceedings of the 8th International Symposium on Software Metrics
Operational anomalies as a cause of safety-critical requirements evolution

Journal of Systems and Software
Static analysis tools as early indicators of pre-release defect density

Proceedings of the 27th international conference on Software engineering
Data Mining

Data Mining
Data Mining Static Code Attributes to Learn Defect Predictors

IEEE Transactions on Software Engineering
A Replicated Quantitative Analysis of Fault Distributions in Complex Software Systems

IEEE Transactions on Software Engineering
Architecture-Based Software Reliability: Why Only a Few Parameters Matter?

COMPSAC '07 Proceedings of the 31st Annual International Computer Software and Applications Conference - Volume 01
The Effects of Over and Under Sampling on Fault-prone Module Detection

ESEM '07 Proceedings of the First International Symposium on Empirical Software Engineering and Measurement
Comments on "Data Mining Static Code Attributes to Learn Defect Predictors"

IEEE Transactions on Software Engineering
The influence of organizational structure on software quality: an empirical case study

Proceedings of the 30th international conference on Software engineering
Implications of ceiling effects in defect predictors

Proceedings of the 4th international workshop on Predictor models in software engineering
On the Distribution of Software Faults

IEEE Transactions on Software Engineering
Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models

Empirical Software Engineering
Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings

IEEE Transactions on Software Engineering
Practical considerations in deploying AI for defect prediction: a case study within the Turkish telecommunication industry

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
Design and code inspections to reduce errors in program development

IBM Systems Journal
Defect prediction from static code features: current results, limitations, new approaches

Automated Software Engineering

Customization support for CBR-based defect prediction

Proceedings of the 7th International Conference on Predictive Models in Software Engineering
Sample-based software defect prediction with active and semi-supervised learning

Automated Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

BACKGROUND: Defect predictors learned from static code measures can isolate code modules with a higher than usual probability of defects. AIMS: To improve those learners by focusing on the defect-rich portions of the training sets. METHOD: Defect data CM1, KC1, MC1, PC1, PC3 was separated into components. A subset of the projects (selected at random) were set aside for testing. Training sets were generated for a NaiveBayes classifier in two ways. In sample the dense treatment, the components with higher than the median number of defective modules were used for training. In the standard treatment, modules from any component were used for training. Both samples were run against the test set and evaluated using recall, probability of false alarm, and precision. In addition, under sampling and over sampling was performed on the defect data. Each method was repeated in a 10-by-10 cross-validation experiment. RESULTS: Prediction models learned from defect dense components out-performed standard method, under sampling, as well as over sampling. In statistical rankings based on recall, probability of false alarm, and precision, models learned from dense components won 4--5 times more often than any other method, and also lost the least amount of times. CONCLUSIONS: Given training data where most of the defects exist in small numbers of components, better defect predictors can be trained from the defect dense components.