Customization support for CBR-based defect prediction

  • Authors:
  • Elham Paikari;Bo Sun;Guenther Ruhe;Emadoddin Livani

  • Affiliations:
  • University of Calgary, Calgary, AB, Canada;University of Calgary, Calgary, AB, Canada;University of Calgary, Calgary, AB, Canada;University of Calgary, Calgary, AB, Canada

  • Venue:
  • Proceedings of the 7th International Conference on Predictive Models in Software Engineering
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Background: The prediction performance of a case-based reasoning (CBR) model is influenced by the combination of the following parameters: (i) similarity function, (ii) number of nearest neighbor cases, (iii) weighting technique used for attributes, and (iv) solution algorithm. Each combination of the above parameters is considered as an instantiation of the general CBR-based prediction method. The selection of an instantiation for a new data set with specific characteristics (such as size, defect density and language) is called customization of the general CBR method. Aims: For the purpose of defect prediction, we approach the question which combinations of parameters works best at which situation. Three more specific questions were studied: (RQ1) Does one size fit all? Is one instantiation always the best? (RQ2) If not, which individual and combined parameter settings occur most frequently in generating the best prediction results? (RQ3) Are there context-specific rules to support the customization? Method: In total, 120 different CBR instantiations were created and applied to 11 data sets from the PROMISE repository. Predictions were evaluated in terms of their mean magnitude of relative error (MMRE) and percentage Pred(α) of objects fulfilling a prediction quality level α. For the third research question, dependency network analysis was performed. Results: Most frequent parameter options for CBR instantiations were neural network based sensitivity analysis (as the weighting technique), un-weighted average (as the solution algorithm), and maximum number of nearest neighbors (as the number of nearest neighbors). Using dependency network analysis, a set of recommendations for customization was provided. Conclusion: An approach to support customization is provided. It was confirmed that application of context-specific rules across groups of similar data sets is risky and produces poor results.