Customization support for CBR-based defect prediction

Authors:
Elham Paikari;Bo Sun;Guenther Ruhe;Emadoddin Livani
Affiliations:
University of Calgary, Calgary, AB, Canada;University of Calgary, Calgary, AB, Canada;University of Calgary, Calgary, AB, Canada;University of Calgary, Calgary, AB, Canada
Venue:
Proceedings of the 7th International Conference on Predictive Models in Software Engineering
Year:
2011

Citing 21
Cited 0

Software engineering metrics and models

Software engineering metrics and models
Case-based reasoning: foundational issues, methodological variations, and system approaches

AI Communications
Glossary of Terms

Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
Comparing case-based reasoning classifiers for predicting high risk software components

Journal of Systems and Software
Towards the Integration of Case-Based, Schema-Based and Model-Based Reasoning for Supporting Complex Design Tasks

ICCBR '95 Proceedings of the First International Conference on Case-Based Reasoning Research and Development
Estimating Software Project Effort by Analogy Based on Linguistic Values

METRICS '02 Proceedings of the 8th International Symposium on Software Metrics
Predicting Fault-Prone Modules with Case-Based Reasoning

ISSRE '97 Proceedings of the Eighth International Symposium on Software Reliability Engineering
Modeling software quality: the Software Measurement Analysis and Reliability Toolkit

ICTAI '00 Proceedings of the 12th IEEE International Conference on Tools with Artificial Intelligence
Dependency networks for inference, collaborative filtering, and data visualization

The Journal of Machine Learning Research
A Simulation Study of the Model Evaluation Criterion MMRE

IEEE Transactions on Software Engineering
Discovering Knowledge in Data: An Introduction to Data Mining

Discovering Knowledge in Data: An Introduction to Data Mining
An empirical study of predicting software faults with case-based reasoning

Software Quality Control
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Analysis of attribute weighting heuristics for analogy-based software effort estimation method AQUA+

Empirical Software Engineering
Review: A systematic review of software fault prediction studies

Expert Systems with Applications: An International Journal
Stable rankings for different effort models

Automated Software Engineering
Case-based reasoning vs parametric models for software quality optimization

Proceedings of the 6th International Conference on Predictive Models in Software Engineering
On the value of learning from defect dense components for software defect prediction

Proceedings of the 6th International Conference on Predictive Models in Software Engineering
Evolutionary Optimization of Software Quality Modeling with Multiple Repositories

IEEE Transactions on Software Engineering
A General Software Defect-Proneness Prediction Framework

IEEE Transactions on Software Engineering
Regularities in learning defect predictors

PROFES'10 Proceedings of the 11th international conference on Product-Focused Software Process Improvement

Quantified Score

Hi-index	0.00

Visualization

Abstract

Background: The prediction performance of a case-based reasoning (CBR) model is influenced by the combination of the following parameters: (i) similarity function, (ii) number of nearest neighbor cases, (iii) weighting technique used for attributes, and (iv) solution algorithm. Each combination of the above parameters is considered as an instantiation of the general CBR-based prediction method. The selection of an instantiation for a new data set with specific characteristics (such as size, defect density and language) is called customization of the general CBR method. Aims: For the purpose of defect prediction, we approach the question which combinations of parameters works best at which situation. Three more specific questions were studied: (RQ1) Does one size fit all? Is one instantiation always the best? (RQ2) If not, which individual and combined parameter settings occur most frequently in generating the best prediction results? (RQ3) Are there context-specific rules to support the customization? Method: In total, 120 different CBR instantiations were created and applied to 11 data sets from the PROMISE repository. Predictions were evaluated in terms of their mean magnitude of relative error (MMRE) and percentage Pred(α) of objects fulfilling a prediction quality level α. For the third research question, dependency network analysis was performed. Results: Most frequent parameter options for CBR instantiations were neural network based sensitivity analysis (as the weighting technique), un-weighted average (as the solution algorithm), and maximum number of nearest neighbors (as the number of nearest neighbors). Using dependency network analysis, a set of recommendations for customization was provided. Conclusion: An approach to support customization is provided. It was confirmed that application of context-specific rules across groups of similar data sets is risky and produces poor results.