An empirical evaluation of outlier deletion methods for analogy-based cost estimation

  • Authors:
  • Masateru Tsunoda;Takeshi Kakimoto;Akito Monden;Ken-ichi Matsumoto

  • Affiliations:
  • Nara Institute of Science and Technology, Kansai Science City, Japan;Kagawa National College of Technology, Chokushicho, Takamatsu-shi, Kagawa, Japan;Nara Institute of Science and Technology, Kansai Science City, Japan;Nara Institute of Science and Technology, Kansai Science City, Japan

  • Venue:
  • Proceedings of the 7th International Conference on Predictive Models in Software Engineering
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Background: Any software project dataset sometimes includes outliers which affect the accuracy of effort estimation. Outlier deletion methods are often used to eliminate them. However, there are few case studies which apply outlier deletion methods to analogy-based estimation, so it is not clear which method is more suitable for analogy-based estimation. Aim: Clarifying the effects of existing outlier deletion methods (Cook's distance based deletion, LTS based deletion, k-means based deletion, Mantel's correlation based deletion, and EID based deletion) and our method for analogy-based estimation. Method: In the experiment, outlier deletion methods were applied to three kinds of datasets (the ISBSG, Kitchenham, and Desharnais datasets), and their estimation accuracy evaluated based on BRE (Balanced Relative Error). Our method eliminates outliers from the neighborhoods of a target project when the effort is extremely different from other neighborhoods. Results: Deletion methods which are designed to apply to analogy-based estimation (i.e. Mantel's correlation based deletion, EID based deletion, and our method) showed stable performance. Especially, only our method showed over 10% improvement of the average BRE on two datasets. Conclusions: It is reasonable to apply deletion methods designed for analogy-based estimation, and more preferable to apply our method to analogy-based estimation.