An empirical evaluation of outlier deletion methods for analogy-based cost estimation

Authors:
Masateru Tsunoda;Takeshi Kakimoto;Akito Monden;Ken-ichi Matsumoto
Affiliations:
Nara Institute of Science and Technology, Kansai Science City, Japan;Kagawa National College of Technology, Chokushicho, Takamatsu-shi, Kagawa, Japan;Nara Institute of Science and Technology, Kansai Science City, Japan;Nara Institute of Science and Technology, Kansai Science City, Japan
Venue:
Proceedings of the 7th International Conference on Predictive Models in Software Engineering
Year:
2011

Citing 22
Cited 0

Software engineering metrics and models

Software engineering metrics and models
Learning from Examples: Generation and Evaluation of Decision Trees for Software Resource Analysis

IEEE Transactions on Software Engineering - Special Issue on Artificial Intelligence in Software Applications
Robust regression for developing software estimation models

Journal of Systems and Software
Machine Learning Approaches to Estimating Software Development Effort

IEEE Transactions on Software Engineering
Estimating Software Project Effort Using Analogies

IEEE Transactions on Software Engineering
Software Engineering Economics

Software Engineering Economics
An Empirical Study of Analogy-based Software Effort Estimation

Empirical Software Engineering
A Simulation Tool for Efficient Analogy Based Cost Estimation

Empirical Software Engineering
An empirical study of maintenance and development estimation accuracy

Journal of Systems and Software
A Replicated Assessment of the Use of Adaptation Rules to Improve Web Cost Estimation

ISESE '03 Proceedings of the 2003 International Symposium on Empirical Software Engineering
What Should You Optimize When Building an Estimation Model?

METRICS '05 Proceedings of the 11th IEEE International Software Metrics Symposium
A Comparison of Software Project Overruns-Flexible versus Sequential Development Models

IEEE Transactions on Software Engineering
Cross-company and single-company effort models using the ISBSG database: a further replicated study

Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering
Outlier elimination in construction of software metric models

Proceedings of the 2007 ACM symposium on Applied computing
Cross versus Within-Company Cost Estimation Studies: A Systematic Review

IEEE Transactions on Software Engineering
Is This Cost Estimate Reliable? -- The Relationship between Homogeneity of Analogues and Estimation Reliability

ESEM '07 Proceedings of the First International Symposium on Empirical Software Engineering and Measurement
Cross-company vs. single-company web effort models using the Tukutuku database: An extended study

Journal of Systems and Software
An empirical analysis of software effort estimation with outlier elimination

Proceedings of the 4th international workshop on Predictor models in software engineering
Analogy-X: Providing Statistical Inference to Analogy-Based Software Cost Estimation

IEEE Transactions on Software Engineering
Feature weighting heuristics for analogy-based effort estimation models

Expert Systems with Applications: An International Journal
An empirical analysis of linear adaptation techniques for case-based prediction

ICCBR'03 Proceedings of the 5th international conference on Case-based reasoning: Research and Development
Filtering of Inconsistent Software Project Data for Analogy-Based Effort Estimation

COMPSAC '10 Proceedings of the 2010 IEEE 34th Annual Computer Software and Applications Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Background: Any software project dataset sometimes includes outliers which affect the accuracy of effort estimation. Outlier deletion methods are often used to eliminate them. However, there are few case studies which apply outlier deletion methods to analogy-based estimation, so it is not clear which method is more suitable for analogy-based estimation. Aim: Clarifying the effects of existing outlier deletion methods (Cook's distance based deletion, LTS based deletion, k-means based deletion, Mantel's correlation based deletion, and EID based deletion) and our method for analogy-based estimation. Method: In the experiment, outlier deletion methods were applied to three kinds of datasets (the ISBSG, Kitchenham, and Desharnais datasets), and their estimation accuracy evaluated based on BRE (Balanced Relative Error). Our method eliminates outliers from the neighborhoods of a target project when the effort is extremely different from other neighborhoods. Results: Deletion methods which are designed to apply to analogy-based estimation (i.e. Mantel's correlation based deletion, EID based deletion, and our method) showed stable performance. Especially, only our method showed over 10% improvement of the average BRE on two datasets. Conclusions: It is reasonable to apply deletion methods designed for analogy-based estimation, and more preferable to apply our method to analogy-based estimation.