Software effort models should be assessed via leave-one-out validation

Authors:
Ekrem Kocaguneli;Tim Menzies
Affiliations:
CSEE, West Virginia University, Morgantown, USA;CSEE, West Virginia University, Morgantown, USA
Venue:
Journal of Systems and Software
Year:
2013

Citing 31
Cited 0

An empirical validation of software cost estimation models

Communications of the ACM
Robust regression for developing software estimation models

Journal of Systems and Software
Effort estimation using analogy

Proceedings of the 18th international conference on Software engineering
Inter-item correlations among function points

ICSE '93 Proceedings of the 15th international conference on Software Engineering
Estimating Software Project Effort Using Analogies

IEEE Transactions on Software Engineering
On Combining Classifiers

IEEE Transactions on Pattern Analysis and Machine Intelligence
Comparing Software Prediction Techniques Using Simulation

IEEE Transactions on Software Engineering - Special section on the seventh international software metrics symposium
Software development cost estimation approaches – A survey

Annals of Software Engineering
A Comparative Study of Cost Estimation Models for Web Hypermedia Applications

Empirical Software Engineering
Ensemble Methods in Machine Learning

MCS '00 Proceedings of the First International Workshop on Multiple Classifier Systems
Multiclassifier Systems: Back to the Future

MCS '02 Proceedings of the Third International Workshop on Multiple Classifier Systems
Benchmarking Attribute Selection Techniques for Discrete Class Data Mining

IEEE Transactions on Knowledge and Data Engineering
Reliability and Validity in Comparative Studies of Software Prediction Models

IEEE Transactions on Software Engineering
Prediction error estimation: a comparison of resampling methods

Bioinformatics
Optimal Project Feature Weights in Analogy-Based Cost Estimation: Improvement and Limitations

IEEE Transactions on Software Engineering
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
A Systematic Review of Software Development Cost Estimation Studies

IEEE Transactions on Software Engineering
Selecting Best Practices for Effort Estimation

IEEE Transactions on Software Engineering
Systematic review: A systematic review of effect size in software engineering experiments

Information and Software Technology
Finding Prototypes For Nearest Neighbor Classifiers

IEEE Transactions on Computers
Software Function, Source Lines of Code, and Development Effort Prediction: A Software Science Validation

IEEE Transactions on Software Engineering
Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings

IEEE Transactions on Software Engineering
ENNA: software effort estimation using ensemble of neural networks with associative memory

Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering
Theoretical Maximum Prediction Accuracy for Analogy-Based Software Cost Estimation

APSEC '08 Proceedings of the 2008 15th Asia-Pacific Software Engineering Conference
A study of project selection and feature weighting for analogy based software cost estimation

Journal of Systems and Software
Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions

Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions
Special issue on repeatable results in software engineering prediction

Empirical Software Engineering
Local vs. global models for effort estimation and defect prediction

ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
Exploiting the Essential Assumptions of Analogy-Based Effort Estimation

IEEE Transactions on Software Engineering
A review of studies on expert estimation of software development effort

Journal of Systems and Software
On the Value of Ensemble Effort Estimation

IEEE Transactions on Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Context: More than half the literature on software effort estimation (SEE) focuses on model comparisons. Each of those requires a sampling method (SM) to generate the train and test sets. Different authors use different SMs such as leave-one-out (LOO), 3Way and 10Way cross-validation. While LOO is a deterministic algorithm, the N-way methods use random selection to build their train and test sets. This introduces the problem of conclusion instability where different authors rank effort estimators in different ways. Objective: To reduce conclusion instability by removing the effects of a sampling method's random test case generation. Method: Calculate bias and variance (B&V) values following the assumption that a learner trained on the whole dataset is taken as the true model; then demonstrate that the B&V and runtime values for LOO are similar to N-way by running 90 different algorithms on 20 different SEE datasets. For each algorithm, collect runtimes, B&V values under LOO, 3Way and 10Way. Results: We observed that: (1) the majority of the algorithms have statistically indistinguishable B&V values under different SMs and (2) different SMs have similar run times. Conclusion: In terms of their generated B&V values and runtimes, there is no reason to prefer N-way over LOO. In terms of reproducibility, LOO removes one cause of conclusion instability (the random selection of train and test sets). Therefore, we depreciate N-way and endorse LOO validation for assessing effort models.